next up previous


Postscript version of these notes

STAT 350: Lecture 15

Reading: Chapter 7 sections 1, 2. Chpater 5 sections 11 and 12.

Extra Sum of Squares

Here is a more general version of the extra Sum of Squares method. Consider two $n \times p$ design matrices, X and X*. Suppose that there is a $p \times p$ invertible matrix A such that X=X* A.

If we fit the model $Y=X\beta+\epsilon$ we get
\begin{align*}\hat\beta & = (X^T X)^{-1} X^T Y
\\
\hat\mu &= X\hat\beta = X(X^T X)^{-1} X^T Y
\end{align*}
Fitting the model $Y=X^* \beta^* + \epsilon$ we get
\begin{align*}\hat\beta^* & = ({X^*}^T X^*)^{-1} {X^*}^T Y
\\
\hat\mu^* &= X^*\hat\beta^* = X^*({X^*}^T X^*)^{-1} {X^*}^T Y
\end{align*}

Now plug in X* A for X in $\hat\beta$ and $\hat\mu$ to get
\begin{align*}\hat\mu & = X^* A \left( (X^* A)^T X^* A\right)^{-1} (X^* A)^T Y
\...
...underbrace{\left(A^T\right)^{-1}A^T}_{I} {X^*}^TY
\\
& = \hat\mu^*
\end{align*}
So X and X* lead to the same fitted vector.

Notice that $\beta^* = A\beta$ and $\hat\beta^* = A\hat\beta$.

If X=X*A for some invertible A then

\begin{displaymath}{\rm col}(X) = {\rm col}(X^*)
\end{displaymath}

Now suppose that X=X*A but that A is not invertible (for example X might have fewer columns than X*). The idea is that in this case X describes a sub-model of that described by X.

Example: The two sample problem.

\begin{displaymath}X^* =
\left[ \begin{array}{cc}
1 & 0
\\
1 & 0
\\
\vdots &...
... 0
\\
0 & 1
\\
\vdots & \vdots
\\
0 & 1
\end{array}\right]
\end{displaymath}

where there are n1 rows of $[1 \quad 0]$ and n2 of $[0 \quad 1]$.

\begin{displaymath}X=\left[ \begin{array}{c}
1 \\ \vdots \\ 1 \end{array}\right]
\end{displaymath}

where X is $(n_1+n_2) \times 1$. Then

\begin{displaymath}X^* \left[\begin{array}{c} 1 \\ 1 \end{array}\right] = X
\end{displaymath}

The model

\begin{displaymath}Y = X^* \beta + \epsilon
\end{displaymath}

is really
\begin{align*}Y_i & = \beta_1 + \epsilon_i \qquad 1 \le 1 \le n_1
\\
Y_i & = \beta_2 + \epsilon_i \qquad n_1+1 \le i \le n_1+n_2
\end{align*}
that is, it is the model for two samples of sizes n1 and n2 with means $\beta_1$ and $\beta_2$. The model

\begin{displaymath}Y=X\beta_1+\epsilon
\end{displaymath}

is just

\begin{displaymath}Y_i = \beta_1 + \epsilon_i
\end{displaymath}

that is, that the Yi are an iid sample from a single population.

We compare these two models to test $H_o: \beta_1 = \beta_2$ using

\begin{displaymath}F = \frac{({\rm ESS}_R - {\rm ESS}_F)/1}{ {\rm ESS}_F/(n_1+n_2 - 2)}
=
\left( \mbox{2 sample $t$}\right)^2
\end{displaymath}

Summary: to compare two models with design matrices X and X* we need

\begin{displaymath}{\rm col}(X) \subset {\rm col}(X^*) \qquad {\rm col}(X) \neq {\rm col}(X^*)
\end{displaymath}

That is, we need the restricted model to be a special case of the full model where we put some linear combinations of the $\beta$s equal to constants. (The constants are usually 0.)

In our example consider the special case n1 = n2. If we reparametrize the * model using

\begin{displaymath}\mu=\frac{\beta_1+\beta_2}{2} \qquad \alpha = \beta_1 - \mu
\end{displaymath}

then we find

\begin{displaymath}X^*\beta = X^{**} \left[ \begin{array}{c} \mu \\ \alpha \end{array} \right]
\end{displaymath}

where

\begin{displaymath}X^{**} = \left[ \begin{array}{cc}
1 & 1
\\
1 & 1
\\
\vdot...
...\\
1 & -1
\\
\vdots & \vdots
\\
1 & -1
\end{array} \right]
\end{displaymath}

In this case we see that X is a submatrix of X**. This is why the extra sum of squares principle can be used whenever the restricted model is a special case of the full model, even if X is not a submatrix of X*.

Another Extra Sum of Squares Example: two way layout

We have data Yi,j,k for i from 1 to I, j from 1 to J and k from 1 to K where i labels the row effect, j labels the column effect and k labels the replicate. When K is more than 1 we generally check for interactions by comparing the additive model

\begin{displaymath}Y_{i,j,k} = \mu + \alpha_i + \beta_j + \epsilon_{i,j,k}
\end{displaymath}

to a saturated model in which the mean $\mu_{i,j}$ for the combination i,j is unrestricted. Thus the full model is

\begin{displaymath}Y_{i,j,k} = \mu_{i,j}j + \epsilon_{i,j,k} \, .
\end{displaymath}

The additive model is not identifiable (that is, the design matrix is not of full rank) unless some conditions are imposed on the row effects $\alpha_i$ and the column effects $\beta_j$. A common restriction imposed is that the effects sum to 0; this restriction is then used to eliminate $\alpha_I$ and $\beta_J$ from the model equations. The resulting design matrix then has 1+(I-1)+(J-1) = I+J-1 columns and looks like

\begin{displaymath}X_{\mbox{add}} = \left[\begin{array}{rrrcrrr}
1 & 1 & 0 & \cd...
...\
1 & -1 & -1 & \cdots & -1 & -1 & \cdots
\end{array}\right]
\end{displaymath}

(There are K copies of the first row for the observations in population i=1,j=1, then K copies of the row for observations in population i=1,j=2 and so on till we get to j=J. Elimination of $\beta_J = -\beta_1 -\beta_2 -\cdots - \beta_{J-1}$ produces -1's in the J-1 columns corresponding to the $\beta$'s. Then we move to the JK rows corresponding to i=2 and so on with the last JK rows having -1's in the $\alpha$ columns reflecting the identity $\alpha_I = - \alpha_1 -\cdots -
\alpha_{I-1}$.)

The full model is often reparametrized as

\begin{displaymath}Y_{i,j,k} = \mu + \alpha_i + \beta_j + \lambda_{i,j} + \epsilon_{i,j,k}
\end{displaymath}

but the design matrix is actually much simpler for the first parameterization:

\begin{displaymath}X_{\mbox{Full}} = \left[\begin{array}{rrrrr}
1 & 0 & 0 & \cd...
...\\
& & \vdots \\
0 & 0 & 0 & \cdots & 1
\end{array}\right]
\end{displaymath}

where there are K copies of the first row, $[1,0,\ldots,0]$ and then K copies of $[0,1,0,\ldots,0]$ and so on. There are a total of IJ columns and IJK rows.

It is not hard to find a matrix A such that

\begin{displaymath}X_{\mbox{add}} = X_{\mbox{Full}} A
\end{displaymath}

For instance the first column of A will be all 1's since this corresponds to adding the columns of $X_{\mbox{Full}}$ together and this produces a column of 1's which is the first column of $X_{\mbox{add}}$. To produce the second column of $X_{\mbox{add}}$ we have to add together the first I columns of $X_{\mbox{Full}}$ and then subtract out the last I columns of $X_{\mbox{Full}}$. Thus the second column of A consists of I 1's followed by I(J-2) 0's followed by I -1's.


next up previous



Richard Lockhart
1999-01-13