next up previous


Postscript version of these notes

STAT 350: Lecture 9

Reading: Chapter 6.1-6.

Distribution Theory for Least Squares

Assume:

\begin{displaymath}{\rm E}(\epsilon_i) = 0 \qquad \mbox{and} \qquad Y=X\beta+\epsilon
\end{displaymath}

Then

1.
${\rm E}(\hat\beta) = \beta$

2.
${\rm E}(\hat\mu) = X{\rm E}(\hat\beta) =X\beta = \mu$

3.
$\hat\epsilon = (I-X(X^T X)^{-1} X^T) \epsilon \equiv M\epsilon$ where M=I-X(XT X)-1 XT.

4.
${\rm E}(\hat\epsilon) = M{\rm E}(\epsilon) = 0$

Define: H= X(XT X)-1 XT, the hat matrix so that M=I-H.

If also

\begin{displaymath}{\rm Var}(\epsilon) = \sigma^2 I
\end{displaymath}

(as will be the case for instance if the $\epsilon_i$ are iid) then

1.
${\rm Var}(\hat\beta) = \sigma^2 (X^T X)^{-1}$

2.
${\rm Var}(\hat\mu) = \sigma^2 X(X^T X)^{-1} X^T = \sigma^2 H$

3.
${\rm Var}(\hat\epsilon) = M{\rm Var}(\epsilon)M^T = \sigma^2 MM^T$

If also $\epsilon_i \sim N(0,\sigma^2)$ are independent then

\begin{displaymath}\hat\beta \sim MVN(\beta,\sigma^2 (X^T X)^{-1}) \qquad \mbox{and}\qquad
\hat\mu \sim MVN(\mu,\sigma^2 H)
\end{displaymath}

Notice $\hat\mu = H Y$ and that H is $n \times n$.

Some algebraic simplification of the variances above is possible.
\begin{align*}{\rm Var}(\hat\epsilon) & = \sigma^2 MM^T
\\
& = \sigma^2 (I-H)(I-H)^T
\\
& = \sigma^2 (II^T - HI^T -IH^T +HH^T)
\end{align*}

But

I=IT

so IIT=I and
\begin{align*}H^T & = \left[ X(X^T X)^{-1} X^T\right]^T
\\
& = X\left[(X^T X)^{...
...X)^T\right]^{-1} X^T
\\
&= X \left[ X^TX\right]^{-1} X^T
\\
& = H
\end{align*}

ASIDE: as you read that sequence of formulas you will see that I expect you to remember a number of algebraic facts about matrices:

1.
(AT)-1 = (A-1)T

2.
(AB)T = BT AT

3.
(AT)T = A

So

\begin{displaymath}M^T=M \qquad \mbox{and} \qquad H^T=H
\end{displaymath}

Finally

\begin{displaymath}HH^T = HH=H^2 = X(X^T X)^{-1}\underbrace{X^TX (X^TX)^{-1}}_{\rm
Identity}X^T = X(X^TX)^{-1} X^T = H
\end{displaymath}

Thus
\begin{align*}H^2 & = H
\\
M^2 & = I - H - H + H^2
\\
& = I-H
\\
& = M
\end{align*}

Definition: A matrix Q is idempotent if

\begin{displaymath}QQ \equiv Q^2 = Q
\end{displaymath}

So What?

1.
Distribution theory of Sums of Squares in ANOVA tables uses this.

2.
F tests for hypotheses about parameters are justified using these matrix ideas.

3.
t-tests, and confidence intervals for $c^T\beta$ (where c is a vector of length p can be derived using these ideas.

Example: Estimation of $\sigma$

Estimation of $\sigma^2$ is based on the error sum of squares defined by
\begin{align*}\mbox{ESS} & = \vert\vert\hat\epsilon\vert\vert^2
\\
& = \sum \ha...
...on)
\\
& = \epsilon^T M^T M \epsilon
\\
& = \epsilon^T M \epsilon
\end{align*}

Now note that
\begin{align*}{\rm E}[{\rm ESS}] & =
{\rm E}(\epsilon^T M \epsilon)
\\
& = {\rm...
...\epsilon_j]
\\
& = \sum_{ij} M_{ij} {\rm E}[\epsilon_i \epsilon_j]
\end{align*}
But ${\rm E}[\epsilon_i \epsilon_j]=0$ for $i \neq j$ and ${\rm E}[\epsilon_i^2]=\sigma^2$ so
\begin{align*}{\rm E}[{\rm ESS}] & = \sigma^2 \sum_i M_{ii}
\\
& = \sigma^2 {\rm trace}(M)
\end{align*}

Definition: The trace of a square matrix Q is defined by

\begin{displaymath}{\rm trace}(Q) = \sum_i Q_{ii}
\end{displaymath}

Marvelous Matrix Identity (cyclic invariance of the trace). Suppose that $A_{m \times n}$ and $B_{n \times m}$ so that $(AB)_{m \times m}$ and $(BA)_{n \times n}$ are both square. Then
\begin{align*}{\rm trace}(AB) & = \sum_i (AB)_{ii}
\\
& = \sum_i (\sum_j A_{ij}...
...m_i B_{ji}A_{ij} )
\\
& = \sum_j (BA)_{jj}
\\
& = {\rm trace}(BA)
\end{align*}

The same idea works with more than two matrices provided the product is square so, e.g.,

\begin{displaymath}{\rm trace}(ABCD) = {\rm trace}(DABC) = {\rm trace}(CDAB) = {\rm
trace}(BCDA)
\end{displaymath}

Another algebraic identity for the trace:
\begin{align*}{\rm trace}(A+B) & = \sum(A+B)_{ii}
\\
& = \sum (A_{ii}+B_{ii})
\\
& = {\rm trace}(A) + {\rm trace}(B)
\end{align*}

SO:
\begin{align*}{\rm trace}(M) & = {\rm trace}(I-H)
\\
& = {\rm trace}(I) -{\rm t...
...{I_{p\times p}})
\\
& = n - {\rm trace}(I_{p\times p})
\\
& = n-p
\end{align*}

Notice that p is the number of columns of X including the column of 1's if present.

Summary of result


\begin{displaymath}{\rm E}\left[ \frac{{\rm ESS}}{n-p}\right] = \sigma^2
\end{displaymath}

So the Mean Squared Error, ESS/(n-p) is an unbiased estimate of $\sigma^2$.

Extras

Here is an example of some of the matrix algebra I was doing in class. Consider the weighing design where two objects of weights $\beta_1$ and $\beta_2$ are weighed individually and together. The resulting design matrix is

\begin{displaymath}X = \left[\begin{array}{rr} 1 & 0 \\ 0 & 1 \\ 1 & 1\end{array}\right]
\end{displaymath}

So

\begin{displaymath}X^tX = \left[\begin{array}{rr} 2 & 1 \\ 1 & 2 \end{array}\right]
\end{displaymath}

and

\begin{displaymath}(X^TX)^{-1} = \left[\begin{array}{rr} \frac{2}{3} & -\frac{1}{3} \\
-\frac{1}{3} & \frac{2}{3} \end{array}\right]
\end{displaymath}

The hat matrix is

\begin{displaymath}\left[\begin{array}{rr} 1 & 0 \\
0 & 1 \\ 1 & 1\end{array}\r...
...[\begin{array}{rrr}
1 & 0 & 1 \\ 0 & 1 & 1 \end{array}\right]
\end{displaymath}

which is

\begin{displaymath}\left[\begin{array}{rrr}
\frac{2}{3} & -\frac{1}{3} & \frac{1...
...\
\frac{1}{3} & \frac{1}{3} & \frac{2}{3}
\end{array}\right]
\end{displaymath}

Notice that the trace of H is 2 which is the number of parameters in $\beta$.

You can also check that H is idempotent.


next up previous



Richard Lockhart
1999-01-08