No Title

STAT 350: Lecture 8

Reading: Chapter 5, chapter 15.

Distribution Theory
Linear and Quadratic Functions of Normals

So far we have defined:

The standard normal density

$\begin{displaymath}f(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \qquad -\infty < x < \infty \end{displaymath}$
$Z\sim N(0,1)$
If $Z\sim N(0,1)$ then $X=\mu+\sigma Z\sim N(\mu,\sigma^2)$ .
If $Z_1,\ldots,Z_n$ are independent N(0,1) then

$\begin{displaymath}Z = \left[ \begin{array}{c} Z_1 \\ \vdots \\ Z_n \end{array} \right] \sim MVN_n(0,I) \end{displaymath}$
$\begin{displaymath}\mu_X \equiv {\rm E}\left( \left[\begin{array}{c} X_1 \\ \vdo... ...array}{c} E(X_1) \\ \vdots \\ {\rm E}(X_n) \end{array} \right] \end{displaymath}$
$\begin{displaymath}{\rm E} \left[ \begin{array}{ccc} M_{11} & \cdots & M_{1c} \... ... {\rm E}(M_{r1}) & \cdots & {\rm E}(M_{rc}) \end{array}\right] \end{displaymath}$
$\begin{displaymath}{\rm Var}(X) = {\rm E}\left[ (X-\mu_X)(X-\mu_X)^T\right] \end{displaymath}$
ijth entry of ${\rm Var}(X)$ is ${\rm Cov}(X_i,X_j)$ .
iith entry is ${\rm Cov}(X_i,X_i) = {\rm Var}(X_i)$ .

Now suppose $Z\sim MVN_n(o,I)$ . Then

$\begin{displaymath}{\rm E}(Z) = \left[ \begin{array}{c} {\rm E}(Z_1) \\ \vdots \\ {\rm E}(Z_n) \end{array}\right] = {\bf0}_n \end{displaymath}$

The ${\bf0}_n$ matches the 0 in MVN_n(0,I).

Next we compute the variance of Z. Note that ${\rm E}\left[(Z-0) (Z-0)^T \right]$ has ijth entry

$\begin{displaymath}{\rm E}(Z_iZ_j) = \left\{ \begin{array}{lll} 0 & i \neq j & ... ...dependence)} \\ {\rm E}(Z_i^2) = 1 & i=j & \end{array}\right. \end{displaymath}$

$\begin{displaymath}{\rm Var}(Z) = I_{n \times n} \end{displaymath}$

the $n \times n$ identity matrix.

Now suppose that

$\begin{displaymath}X=A Z + \mu \end{displaymath}$

where

A is an $m \times n$ matrix of constants.
$\mu$ is a vector in R^m
$Z\sim MVN_n(0,I)$

Then we say that X has a $MVN_m(\mu, AA^T)$ distribution.

Now ${\rm E}(X) = {\rm E}(AX+\mu)$ has ith component
$\begin{align*}{\rm E}[(AZ)_i] + \mu_i & = {\rm E}( \sum_j A_{ij} Z_j) + \mu_i \\ & = \sum_j A_{ij} {\rm E}(Z_j) + \mu_i \\ & = \mu_i \end{align*}$

Moreover,
$\begin{align*}{\rm Var}(X) & = {\rm E}(\left[ (X-\mu)(X-\mu)^T \right] \\ & = {... ...m E}[AZZ^T A^T] \\ & = A {\rm E}[ZZ^T] A^T \\ & = AIA^T \\ =AA^T \end{align*}$

The last three lines need some justification. The point is that matrix multiplication by A, whose entries are constants, is just like multiplication by a constant -- you can pull the constant outside of the expected value sign. Here is the justification. The ijth entry in ${\rm E} [AZZ^T A^T]$ is
$\begin{align*}{\rm E}( \sum_k \sum_\ell A_{ik} Z_k Z_\ell A_{j\ell}) & = \sum_{k... ...{ik} I_{k\ell} A^T_{\ell j} \\ & = (AIA^T)_{ij} \\ &= (AA^T)_{ij} \end{align*}$

So, ${\rm Var}(X) = AA^T$ .

Thus $X\sim MVN(\mu,\Sigma)$ means that

${\rm E}(X) = \mu$
${\rm Var}(X) = \Sigma$
X is ``normal''.

Things to notice along the way

1.: ${\rm E}(AX+b) = A{\rm E}(X) +b$ when $A_{m \times n}$ , $X_{n \times 1}$ $b_{m \times 1}$ and A and b are constant.
2.: ${\rm E}(AMB) = A {\rm E}(M) B$ whenever A, B and M are matrices whose dimensions make the multiplication possible and A and B are non-random constant matrices while M is a random matrix.
3.: ${\rm Var}(AX+b) = A{\rm Var}(X) A^T$ where A and X are as in 1). The notation ${\rm Cov}(X)$ is sometimes used for ${\rm Var}(X)$ . This matrix is called the variance-covariance matrix of X.

Application to Least Squares

The following do not use the normal assumption:

$\begin{align*}\hat\beta & = (X^T X)^{-1} X^T Y \\ & = (X^T X)^{-1} X^T (X\beta+... ...\beta) & = \beta + (X^T X)^{-1} X^T {\rm E}(\epsilon) \\ & = \beta \end{align*}$
So $\hat\beta$ is unbiased.

$\begin{align*}{\rm Var}(\hat\beta) & = {\rm E}[(\hat\beta-\beta)(\hat\beta-\beta... ...a^2 I \left((X^T X)^{-1} X^T\right)^T \\ & = \sigma^2 (X^T X)^{-1} \end{align*}$

If, also, $\epsilon_i \sim N(0,\sigma^2)$ then $\epsilon_i/\sigma \sim N(0,1)$ and
$\begin{align*}\hat\beta & = \beta + (X^T X)^{-1} X^T\epsilon \\ & = \beta + \si... ...sigma^2 \underbrace{(X^T X)^{-1} X^T X(X^T X)^{-1}}_{(X^T X)^{-1}}) \end{align*}$

The fitted vector is

$\begin{displaymath}\hat\mu = X\hat\beta \sim MVN(X\beta, \sigma^2 X(X^T X)^{-1}X^T) \end{displaymath}$

The residual vector is
$\begin{align*}\hat\epsilon & = Y-X\hat\beta \\ & = X\beta+\epsilon -X\beta -X(X... ...& = (I -X(X^T X)^{-1} X^T) \epsilon \\ & \sim MVN(0,\sigma^2 MM^T) \end{align*}$
where M=I-X(X^T X)^-1 X^T).

Notation:

H= X(X^T X)^-1 X^T

Jargon: H is called the hat matrix. Notice that

$\begin{displaymath}\hat\mu = HY \end{displaymath}$

Extras

I tried, as I was calculating things for the vectors $\hat\beta$ , $\hat\mu$ and $\hat\epsilon$ to emphasize which things needed which assumptions.

So for instance we have following matrix identities which depend only on the model equation

$\begin{displaymath}{\bf Y} = {\bf X} {\bf\beta} +{\bf\epsilon}\end{displaymath}$

$\begin{displaymath}\hat\beta = (X^TX)^{-1} X^T Y = \beta + (X^TX)^{-1} X^T \epsilon \end{displaymath}$

$\begin{displaymath}\hat\mu = X\hat\beta = \mu + H\epsilon \end{displaymath}$

where $\bf H$ is the `hat' matrix X(X^TX)^-1 X^T,

$\begin{displaymath}\hat\epsilon = (I-H) \epsilon \end{displaymath}$

If we add the assumption that ${\rm E}(\epsilon_i)=0$ for each i then we get

$\begin{displaymath}{\rm E}(\hat\beta) = \beta \end{displaymath}$

$\begin{displaymath}{\rm E}(\hat\mu) = \mu \end{displaymath}$

and

$\begin{displaymath}{\rm E}(\hat\epsilon) = 0\, . \end{displaymath}$

If we add the assumption that the errors are homoscedastic ( ${\rm Var}(\epsilon_i) = \sigma^2$ for all i) and uncorrelated ( ${\rm Cov}(\epsilon_i, \epsilon_j) = 0$ for all $i \neq j$ ) then we can compute variances and get

$\begin{displaymath}{\rm Var}(\hat\beta) = \sigma^2(X^TX)^{-1} \end{displaymath}$

$\begin{displaymath}{\rm Var}(\hat\mu) = \sigma^2 {\bf H} \end{displaymath}$

and

$\begin{displaymath}{\rm Var}(\hat\epsilon) = \sigma^2 ({\bf I} -{\bf H}) \, . \end{displaymath}$

NOTE: usually we assume that the $\epsilon_i$ are independent and identically distributed which guarantees the homoscedastic, uncorrelated assumption above.

Next we add the assumption that the errors $\epsilon_i$ are independent normal variables. Then we conclude that each of $\hat\beta$ , $\hat\mu$ and $\hat\epsilon$ have Multivariate Normal distributions with the means and variances as just described.

$next$ $up$ $previous$

Richard Lockhart
1999-01-20