No Title

$next$ $up$ $previous$

Postscript version of these notes

STAT 350: Lecture 9

Reading: Chapter 6.1-6.

Distribution Theory for Least Squares

Assume:

$\begin{displaymath}{\rm E}(\epsilon_i) = 0 \qquad \mbox{and} \qquad Y=X\beta+\epsilon \end{displaymath}$

Then

1.: ${\rm E}(\hat\beta) = \beta$
2.: ${\rm E}(\hat\mu) = X{\rm E}(\hat\beta) =X\beta = \mu$
3.: $\hat\epsilon = (I-X(X^T X)^{-1} X^T) \epsilon \equiv M\epsilon$ where M=I-X(X^T X)^-1 X^T.
4.: ${\rm E}(\hat\epsilon) = M{\rm E}(\epsilon) = 0$

Define: H= X(X^T X)^-1 X^T, the hat matrix so that M=I-H.

If also

$\begin{displaymath}{\rm Var}(\epsilon) = \sigma^2 I \end{displaymath}$

(as will be the case for instance if the $\epsilon_i$ are iid) then

1.: ${\rm Var}(\hat\beta) = \sigma^2 (X^T X)^{-1}$
2.: ${\rm Var}(\hat\mu) = \sigma^2 X(X^T X)^{-1} X^T = \sigma^2 H$
3.: ${\rm Var}(\hat\epsilon) = M{\rm Var}(\epsilon)M^T = \sigma^2 MM^T$

If also $\epsilon_i \sim N(0,\sigma^2)$ are independent then

$\begin{displaymath}\hat\beta \sim MVN(\beta,\sigma^2 (X^T X)^{-1}) \qquad \mbox{and}\qquad \hat\mu \sim MVN(\mu,\sigma^2 H) \end{displaymath}$

Notice $\hat\mu = H Y$ and that H is $n \times n$ .

Some algebraic simplification of the variances above is possible.
$\begin{align*}{\rm Var}(\hat\epsilon) & = \sigma^2 MM^T \\ & = \sigma^2 (I-H)(I-H)^T \\ & = \sigma^2 (II^T - HI^T -IH^T +HH^T) \end{align*}$

But

I=I^T

so II^T=I and
$\begin{align*}H^T & = \left[ X(X^T X)^{-1} X^T\right]^T \\ & = X\left[(X^T X)^{... ...X)^T\right]^{-1} X^T \\ &= X \left[ X^TX\right]^{-1} X^T \\ & = H \end{align*}$

ASIDE: as you read that sequence of formulas you will see that I expect you to remember a number of algebraic facts about matrices:

1.: (A^T)^-1 = (A^-1)^T
2.: (AB)^T = B^T A^T
3.: (A^T)^T = A

$\begin{displaymath}M^T=M \qquad \mbox{and} \qquad H^T=H \end{displaymath}$

Finally

$\begin{displaymath}HH^T = HH=H^2 = X(X^T X)^{-1}\underbrace{X^TX (X^TX)^{-1}}_{\rm Identity}X^T = X(X^TX)^{-1} X^T = H \end{displaymath}$

Thus
$\begin{align*}H^2 & = H \\ M^2 & = I - H - H + H^2 \\ & = I-H \\ & = M \end{align*}$

Definition: A matrix Q is idempotent if

$\begin{displaymath}QQ \equiv Q^2 = Q \end{displaymath}$

So What?

1.: Distribution theory of Sums of Squares in ANOVA tables uses this.
2.: F tests for hypotheses about parameters are justified using these matrix ideas.
3.: t-tests, and confidence intervals for $c^T\beta$ (where c is a vector of length p can be derived using these ideas.

Example: Estimation of $\sigma$

Estimation of $\sigma^2$ is based on the error sum of squares defined by
$\begin{align*}\mbox{ESS} & = \vert\vert\hat\epsilon\vert\vert^2 \\ & = \sum \ha... ...on) \\ & = \epsilon^T M^T M \epsilon \\ & = \epsilon^T M \epsilon \end{align*}$

Now note that
$\begin{align*}{\rm E}[{\rm ESS}] & = {\rm E}(\epsilon^T M \epsilon) \\ & = {\rm... ...\epsilon_j] \\ & = \sum_{ij} M_{ij} {\rm E}[\epsilon_i \epsilon_j] \end{align*}$
But ${\rm E}[\epsilon_i \epsilon_j]=0$ for $i \neq j$ and ${\rm E}[\epsilon_i^2]=\sigma^2$ so
$\begin{align*}{\rm E}[{\rm ESS}] & = \sigma^2 \sum_i M_{ii} \\ & = \sigma^2 {\rm trace}(M) \end{align*}$

Definition: The trace of a square matrix Q is defined by

$\begin{displaymath}{\rm trace}(Q) = \sum_i Q_{ii} \end{displaymath}$

Marvelous Matrix Identity (cyclic invariance of the trace). Suppose that $A_{m \times n}$ and $B_{n \times m}$ so that $(AB)_{m \times m}$ and $(BA)_{n \times n}$ are both square. Then
$\begin{align*}{\rm trace}(AB) & = \sum_i (AB)_{ii} \\ & = \sum_i (\sum_j A_{ij}... ...m_i B_{ji}A_{ij} ) \\ & = \sum_j (BA)_{jj} \\ & = {\rm trace}(BA) \end{align*}$

The same idea works with more than two matrices provided the product is square so, e.g.,

$\begin{displaymath}{\rm trace}(ABCD) = {\rm trace}(DABC) = {\rm trace}(CDAB) = {\rm trace}(BCDA) \end{displaymath}$

Another algebraic identity for the trace:
$\begin{align*}{\rm trace}(A+B) & = \sum(A+B)_{ii} \\ & = \sum (A_{ii}+B_{ii}) \\ & = {\rm trace}(A) + {\rm trace}(B) \end{align*}$

SO:
$\begin{align*}{\rm trace}(M) & = {\rm trace}(I-H) \\ & = {\rm trace}(I) -{\rm t... ...{I_{p\times p}}) \\ & = n - {\rm trace}(I_{p\times p}) \\ & = n-p \end{align*}$

Notice that p is the number of columns of X including the column of 1's if present.

Summary of result

$\begin{displaymath}{\rm E}\left[ \frac{{\rm ESS}}{n-p}\right] = \sigma^2 \end{displaymath}$

So the Mean Squared Error, ESS/(n-p) is an unbiased estimate of $\sigma^2$ .

Extras

Here is an example of some of the matrix algebra I was doing in class. Consider the weighing design where two objects of weights $\beta_1$ and $\beta_2$ are weighed individually and together. The resulting design matrix is

$\begin{displaymath}X = \left[\begin{array}{rr} 1 & 0 \\ 0 & 1 \\ 1 & 1\end{array}\right] \end{displaymath}$

$\begin{displaymath}X^tX = \left[\begin{array}{rr} 2 & 1 \\ 1 & 2 \end{array}\right] \end{displaymath}$

and

$\begin{displaymath}(X^TX)^{-1} = \left[\begin{array}{rr} \frac{2}{3} & -\frac{1}{3} \\ -\frac{1}{3} & \frac{2}{3} \end{array}\right] \end{displaymath}$

The hat matrix is

$\begin{displaymath}\left[\begin{array}{rr} 1 & 0 \\ 0 & 1 \\ 1 & 1\end{array}\r... ...[\begin{array}{rrr} 1 & 0 & 1 \\ 0 & 1 & 1 \end{array}\right] \end{displaymath}$

which is

$\begin{displaymath}\left[\begin{array}{rrr} \frac{2}{3} & -\frac{1}{3} & \frac{1... ...\ \frac{1}{3} & \frac{1}{3} & \frac{2}{3} \end{array}\right] \end{displaymath}$

Notice that the trace of H is 2 which is the number of parameters in $\beta$ .

You can also check that H is idempotent.

$next$ $up$ $previous$

Richard Lockhart
1999-01-08