Finding the MLE of Multiple Linear Regression

Introduction

Assalamualaikum. I am Hafiz. Firstly, Alhamdulillah this is my first ever post on blogger. I want to be productive while waiting for my graduation (and also while waiting for job opportunities lol!).

Okay done talking about myself. I will post about one of the most powerful statistical methods, which is Multiple Linear Regression. The objective of this post is to compute the Maximum Likelihood Estimator of Multiple Linear Regression. Therefore, this post will be a bit theoretical sadly (but fun), and it may be boring. But for those who are interested in playing around with equations and theoretical concepts, you might find this post interesting and useful.

Nonetheless, I hope you will enjoy this!

Multiple Linear Regression

Multiple Linear Regression is a statistical analysis method that models the relationship between one dependent variable and two or more independent variables. The primary goal of MLR is to predict the value of dependent variables depending on the values ot the independent variables. This model is represented by following: \[ \textbf{y} = \textbf{X} \beta + \epsilon \sim N(X \beta, \sigma^2 I)\] where \(\textbf{y}\) is the dependent variable, \(\textbf{X}\) are the independent variables, \(\beta\) are the coefficients of the independent variables, and \(\sigma^2\) is the variance of the error terms \(\epsilon\).

Maximum Likelihood Estimator (MLE)

Okay, here I will talk briefly about MLE.

MLE are a set of statistical techniques used to estimate the parameters of a statistical model. The idea of MLE is to find the parameter values that maximize the likelihood function, which measures how likely it is to observe the given data under different parameter values.

Mathematically, the likelihood function \(L(\theta,\textbf{X})\) is equal to the probability density function \(f(\textbf{X},\theta)\). And the MLE \(\hat{\theta}\) is found by:

\[\hat{\theta}=argmax L(\theta,\textbf{X})\]

In practice, it is easier to work with the log-likehood function, which is \(l(\theta,\textbf{X})=log L(\theta, \textbf{X})\)

There are a lot of importance of MLE, however I will not talk about it here. Maybe I will do it in a separate post.

Computing the Likelihood Function

First, I will compute the likelihood function of Multiple Linear Regression! \[ L(\beta, \sigma^2 | \textbf{Y}) = f(\textbf{Y}|\beta, \sigma^2) = (2\pi \sigma^2 I)^{-\frac{n}{2}} exp[-\frac{1}{2\sigma^2 I} \sum_{i=1}^{n}(y_{i}-\textbf{X}_{i}\beta)^2]\] \[\therefore L(\beta,\sigma^2|\textbf{Y})=(2\pi \sigma^2 I)^{-\frac{n}{2}} exp[-\frac{1}{2\sigma^2 I} (y-\textbf{X}\beta)^T(y-\textbf{X}\beta)]\] The log-likelihood function is: \[l(\beta,\sigma^2)=-\frac{n}{2}log(2\pi)-\frac{n}{2}log(\sigma^2 I)-\frac{1}{2\sigma^2 I}(y-\textbf{X}\beta)^T(y-\textbf{X}\beta)\] \[l(\beta,\sigma^2)=-\frac{n}{2}log(2\pi)-\frac{n}{2}log(\sigma^2 I)-\frac{1}{2\sigma^2 I}(y^Ty-2\textbf{X}^T\textbf{X}+\textbf{X}^T\textbf{X}\beta^T\beta)\]

MLE of \(\beta\) and \(\sigma^2\)

Recall the log-likelihood function:

\[l(\beta,\sigma^2)=-\frac{n}{2}log(2\pi)-\frac{n}{2}log(\sigma^2 I)-\frac{1}{2\sigma^2 I}(y^Ty-2\textbf{X}^T\textbf{X}+\textbf{X}^T\textbf{X}\beta^T\beta)\]

The derivative of this log-likelihood function with respect to \(\beta\):

\[ \frac{\partial l(\beta,\sigma^2)}{\partial \beta}=-\frac{1}{2\sigma^2 I}(-2\textbf{X}^Ty+2\textbf{X}^T\textbf{X}\beta) \] Setting this to zero gives the MLE of \(\beta\): \[ -\frac{1}{2\sigma^2 I}(-2\textbf{X}^Ty+2\textbf{X}^T\textbf{X}\hat{\beta})=0 \] \[\textbf{X}^T\textbf{X}\hat{\beta}=\textbf{X}^Ty \] \[(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\textbf{X}\hat{\beta}=(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^Ty \] \[\therefore \hat{\beta}=(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^Ty \]

The derivative of the log-likelihood function with respect to \(\sigma^2\):

\[ \frac{\partial l(\beta,\sigma^2)}{\partial \sigma^2}=-\frac{n}{2\sigma^2 I}+\frac{1}{2\sigma^4 I}(y-\textbf{X}\beta)^T(y-\textbf{X}\beta) \] Setting this to zero gives the MLE of \{\sigma^2\): \[ -\frac{n}{2\sigma^2 I}+\frac{1}{2\sigma^4 I}(y-\textbf{X}\beta)^T(y-\textbf{X}\beta)=0 \] \[\therefore \hat{\sigma^2}=\frac{(y-\textbf{X}\beta)^T(y-\textbf{X}\beta)}{n} \]

Search This Blog

AkuPayau12