next up previous


Postscript version of these notes

STAT 350: Lecture 8

Reading: Chapter 5, chapter 15.

Distribution Theory
Linear and Quadratic Functions of Normals

So far we have defined:

Now suppose $Z\sim MVN_n(o,I)$. Then

\begin{displaymath}{\rm E}(Z) = \left[ \begin{array}{c}
{\rm E}(Z_1) \\ \vdots \\ {\rm E}(Z_n)
\end{array}\right]
= {\bf0}_n
\end{displaymath}

The ${\bf0}_n$ matches the 0 in MVNn(0,I).

Next we compute the variance of Z. Note that ${\rm E}\left[(Z-0) (Z-0)^T \right] $ has ijth entry

\begin{displaymath}{\rm E}(Z_iZ_j) = \left\{
\begin{array}{lll}
0 & i \neq j & ...
...dependence)}
\\
{\rm E}(Z_i^2) = 1 & i=j & \end{array}\right.
\end{displaymath}

So

\begin{displaymath}{\rm Var}(Z) = I_{n \times n}
\end{displaymath}

the $n \times n $ identity matrix.

Now suppose that

\begin{displaymath}X=A Z + \mu
\end{displaymath}

where Then we say that X has a $MVN_m(\mu, AA^T)$ distribution.

Now ${\rm E}(X) = {\rm E}(AX+\mu)$ has ith component
\begin{align*}{\rm E}[(AZ)_i] + \mu_i
& = {\rm E}( \sum_j A_{ij} Z_j) + \mu_i
\\
& = \sum_j A_{ij} {\rm E}(Z_j) + \mu_i
\\
& = \mu_i
\end{align*}

Moreover,
\begin{align*}{\rm Var}(X) & = {\rm E}(\left[ (X-\mu)(X-\mu)^T \right]
\\
& = {...
...m E}[AZZ^T A^T]
\\
& = A {\rm E}[ZZ^T] A^T
\\
& = AIA^T
\\
=AA^T
\end{align*}

The last three lines need some justification. The point is that matrix multiplication by A, whose entries are constants, is just like multiplication by a constant -- you can pull the constant outside of the expected value sign. Here is the justification. The ijth entry in ${\rm E} [AZZ^T A^T]$ is
\begin{align*}{\rm E}( \sum_k \sum_\ell A_{ik} Z_k Z_\ell A_{j\ell})
& = \sum_{k...
...{ik} I_{k\ell} A^T_{\ell j}
\\
& = (AIA^T)_{ij}
\\
&= (AA^T)_{ij}
\end{align*}

So, ${\rm Var}(X) = AA^T$.

Thus $X\sim MVN(\mu,\Sigma)$ means that

Things to notice along the way

1.
${\rm E}(AX+b) = A{\rm E}(X) +b$ when $A_{m \times n}$, $X_{n \times
1}$ $b_{m \times 1}$ and A and b are constant.

2.
${\rm E}(AMB) = A {\rm E}(M) B$ whenever A, B and M are matrices whose dimensions make the multiplication possible and A and B are non-random constant matrices while M is a random matrix.

3.
${\rm Var}(AX+b) = A{\rm Var}(X) A^T$ where A and X are as in 1). The notation ${\rm Cov}(X)$ is sometimes used for ${\rm Var}(X)$. This matrix is called the variance-covariance matrix of X.

Application to Least Squares

The following do not use the normal assumption:


\begin{align*}\hat\beta & = (X^T X)^{-1} X^T Y
\\
& = (X^T X)^{-1} X^T (X\beta+...
...\beta) & = \beta + (X^T X)^{-1} X^T {\rm E}(\epsilon)
\\
& = \beta
\end{align*}
So $\hat\beta$ is unbiased.


\begin{align*}{\rm Var}(\hat\beta) & = {\rm E}[(\hat\beta-\beta)(\hat\beta-\beta...
...a^2 I \left((X^T X)^{-1}
X^T\right)^T
\\
& = \sigma^2 (X^T X)^{-1}
\end{align*}

If, also, $\epsilon_i \sim N(0,\sigma^2)$ then $\epsilon_i/\sigma \sim
N(0,1)$ and
\begin{align*}\hat\beta & = \beta + (X^T X)^{-1} X^T\epsilon
\\
& = \beta + \si...
...sigma^2 \underbrace{(X^T X)^{-1} X^T X(X^T X)^{-1}}_{(X^T
X)^{-1}})
\end{align*}

The fitted vector is

\begin{displaymath}\hat\mu = X\hat\beta \sim MVN(X\beta, \sigma^2 X(X^T X)^{-1}X^T)
\end{displaymath}

The residual vector is
\begin{align*}\hat\epsilon & = Y-X\hat\beta
\\
& = X\beta+\epsilon -X\beta -X(X...
...& = (I -X(X^T X)^{-1} X^T) \epsilon
\\
& \sim MVN(0,\sigma^2 MM^T)
\end{align*}
where M=I-X(XT X)-1 XT).

Notation:

H= X(XT X)-1 XT

Jargon: H is called the hat matrix. Notice that

\begin{displaymath}\hat\mu = HY
\end{displaymath}

Extras

I tried, as I was calculating things for the vectors $\hat\beta$, $\hat\mu$ and $\hat\epsilon$ to emphasize which things needed which assumptions.

So for instance we have following matrix identities which depend only on the model equation

\begin{displaymath}{\bf Y} = {\bf X} {\bf\beta} +{\bf\epsilon}\end{displaymath}


\begin{displaymath}\hat\beta = (X^TX)^{-1} X^T Y = \beta + (X^TX)^{-1} X^T \epsilon
\end{displaymath}


\begin{displaymath}\hat\mu = X\hat\beta = \mu + H\epsilon
\end{displaymath}

where $\bf H$ is the `hat' matrix X(XTX)-1 XT,

\begin{displaymath}\hat\epsilon = (I-H) \epsilon
\end{displaymath}

If we add the assumption that ${\rm E}(\epsilon_i)=0$ for each i then we get

\begin{displaymath}{\rm E}(\hat\beta) = \beta
\end{displaymath}


\begin{displaymath}{\rm E}(\hat\mu) = \mu
\end{displaymath}

and

\begin{displaymath}{\rm E}(\hat\epsilon) = 0\, .
\end{displaymath}

If we add the assumption that the errors are homoscedastic ( ${\rm Var}(\epsilon_i) = \sigma^2$ for all i) and uncorrelated ( ${\rm Cov}(\epsilon_i, \epsilon_j) = 0$ for all $i \neq j$) then we can compute variances and get

\begin{displaymath}{\rm Var}(\hat\beta) = \sigma^2(X^TX)^{-1}
\end{displaymath}


\begin{displaymath}{\rm Var}(\hat\mu) = \sigma^2 {\bf H}
\end{displaymath}

and

\begin{displaymath}{\rm Var}(\hat\epsilon) = \sigma^2 ({\bf I} -{\bf H}) \, .
\end{displaymath}

NOTE: usually we assume that the $\epsilon_i$ are independent and identically distributed which guarantees the homoscedastic, uncorrelated assumption above.

Next we add the assumption that the errors $\epsilon_i$ are independent normal variables. Then we conclude that each of $\hat\beta$, $\hat\mu$ and $\hat\epsilon$ have Multivariate Normal distributions with the means and variances as just described.


next up previous



Richard Lockhart
1999-01-20