next up previous


Postscript version of these notes

STAT 350: Lecture 12

Reading: Chapter 7.

Extra Sum of Squares

Suppose we are fitting a model of the form

\begin{displaymath}Y = [X_1 \vert X_2 ] \left[ \begin{array}{c}
\beta_1 \\ \hline \beta_2 \end{array}\right] + \epsilon
\end{displaymath}

We can test the hypothesis $H_o: \beta_2 = 0$ using the F test:

\begin{displaymath}F = \frac{{\rm Extra SS} / {\rm dim}(\beta_2)}{{\rm ESS}_{\rm FULL}/
(n - p)} \sim F_{{\rm dim}(\beta_2),n-p}
\end{displaymath}

where p is the total number of columns of the design matrix X=[X1 | X2 ] and
\begin{align*}{\rm Extra SS} & = \mbox{Error SS in } Y=X_1\beta_1 + \epsilon
\\
& - \mbox{Error SS in } Y=X_1\beta_1+X_2 \beta_2 + \epsilon
\end{align*}

ANOVA tables

The calculations for the test above are usually recorded in an ANalysis Of VAriance (ANOVA) table. Generally the design matrix will have a column of ones and we adjust the entries in the table for the grand mean $\bar{Y}$. In this case we have

\begin{displaymath}X=[{\bf 1} \vert X_1 \vert X_2 ]
\end{displaymath}

where X1 has p1 columns, X2 has p2 columns and there are a total of 1+p1+p2 parameters in the parameter vector

\begin{displaymath}\beta = \left[
\begin{array}{c} \beta_0 \\ \hline \beta_1 \\ \hline \beta_2 \end{array}\right]
\end{displaymath}

We use the notation p=p1+p2 but you must be careful: don't memorize formulas - learn to count columns. We decompose the data as

\begin{displaymath}Y = {\bf 1} \bar{Y} + (\hat\mu_R - {\bf 1} \bar{Y}) + (\hat\mu_F
-\hat\mu_R) + \hat\epsilon
\end{displaymath}

The components are mutually orthogonal so that we get the sum of squares identity

\begin{displaymath}\vert\vert Y-{\bf 1} \bar{Y}\vert\vert^2 = \vert\vert\hat\mu_...
...u_F-\hat\mu_R\vert\vert^2 + \vert\vert\hat\epsilon\vert\vert^2
\end{displaymath}

which we normally put into an ANOVA table:

Source Sum of Squares Degrees of Freedom
X1 $\vert\vert\hat\mu_R - {\bf 1} \bar{Y}\vert\vert^2$ p1
X2|X1 $\vert\vert\hat\mu_F-\hat\mu_R\vert\vert^2$ p2
Error $\vert\vert\hat\epsilon\vert\vert^2$ n-p-1
Total (Corrected) $\sum(Y_i - \bar{Y})^2$ n-1

Typically we add a column of mean squares, always obtained by dividing the SS column by the degrees of freedom column, a column of F statistics, obtained by dividing the Mean Square by the MSE and a column of P values obtained from software which computes F distribution tail areas.

In this table the notation X2|X1 means X2 adjusted for X1 or X2 after fitting X1.

Analysis of SAND / FIBRE / HARDNESS of plaster example

We regress Y, the hardness of a plaster sample on S, S2, F, F2 and SF. There are 5 factors so there are 25=32 possible submodels of the full model

\begin{displaymath}Y_i = \beta_0 + \beta_1 S_i + \beta_2 S_i^2 + \beta_3 F_i + \beta_4 F_i^2
+ \beta_5 S_i F_i + \epsilon_i
\end{displaymath}

Many of these 32 models are not sensible, such as

\begin{displaymath}Y_i = \beta_0 + \beta_4 F_i^2 + \epsilon_i
\end{displaymath}

or

\begin{displaymath}Y_i = \beta_0 + \beta_5 S_iF_i + \epsilon_i
\end{displaymath}

The term $\beta_5 S_iF_i$ is an interaction of S and F. We analyze the data as follows:

Q: Are the effects of S and F additive?

A: Test $H_o: \beta_5 = 0$.

There are two methods to carry out such a test:

1.
A t test

2.
A F test.

Fact: the F test is equivalent to a two sided t test.

The t test uses

\begin{displaymath}t= \frac{\hat\beta_5 - 0}{\hat\sigma_{\hat\beta_5}} =
\frac{...
...eta_5}{\sqrt{{\rm MSE}}\sqrt{(X^TX)^{-1}_{66}}} \sim t_{1,n-6}
\end{displaymath}

This is exactly the same as the Lecture 10 formula since

\begin{displaymath}\beta_5 = \underbrace{[0,0,0,0,0,1]}_{x^T}\left[
\begin{array}{c} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_5 \end{array}\right]
\end{displaymath}

and xT(XTX)-1 x is the lower right hand corner entry in (XTX)-1, that is, (XTX)-166.

The F test uses

\begin{displaymath}F = \frac{({\rm ESS}_{\rm R} -{\rm ESS}_F) / 1}{{\rm ESS}_{\rm FULL}/
(n - 6)} \sim F_{1,n-6} \quad ( = t^2)
\end{displaymath}

Proof by example:

The Data

Sand  Fibre  Hardness  Strength 
   0    0       61       34
   0    0       63       16
  15    0       67       36
  15    0       69       19
  30    0       65       28
     ...

``Full'' model

\begin{displaymath}Y_i = \beta_0+\beta_1 S_i + \beta_2 S_i^2 + \beta_3 F_i + \beta_4 F_i^2
+ \beta_5 S_i F_I + \epsilon_i
\end{displaymath}

Fitted Models, ESS, df for error

Model for $\mu$ ESS Error df
Full 81.264 12
$\beta_0+\beta_1 S_i + \beta_2 S_i^2 + \beta_3 F_i + \beta_4 F_i^2$ 82.389 13
$\beta_0+\beta_1 S_i + \beta_2 S_i^2 + \beta_3 F_i $ 104.167 14
$\beta_0+\beta_1 S_i + \beta_2 S_i^2 $ 169.500 15
$\beta_0+\beta_1 S_i $ 174.194 16
$\beta_0+\beta_1 S_i + \beta_3 F_i + \beta_4 F_i^2$ 87.083 14
$\beta_0+\beta_1 F_i + \beta_2 F_i^2 $ 189.167 15
$\beta_0+\beta_1 F_i $ 210.944 16
$\beta_0+\beta_1 S_i + \beta_3 F_i $ 108.861 15
$\beta_0 $(empty model) 276.278 17

Hypothesis tests:

1.
Quadratic terms needed? $H_o: \beta_2=\beta_4=\beta_5=0$. Extra SS = 108.861-81.264. F=[ (108.861-81.264)/3]/[81.264/12]= 1.358. Degrees of freedom are 3,12 so P=0.30, not significant.

2.
Linear terms needed? There are several possible F-tests.

(a)
Compare full model to empty model.

F=(276.278-81.264)/5/(81.264/12) = 5.76

so P is about .006.

(b)
Assume full model is now additive, linear model

\begin{displaymath}\beta_0+\beta_1 S_i + \beta_3 F_i .\end{displaymath}

Then

F=[(276.278-108.861)/2]/[108.861/15] = 11.53

and P is about 0.0009.

(c)
Use estimate of $\sigma^2$ from full model but get extra SS from last comparison: F=[(276.278-108.861)/2]/[81.264/12] = 12.36 for a P value of 0.001

Conclusions

You should also examine residual plots. Here is a page of plots:

You will see that the plot of residual against fitted value suggests that we should consider adding a quadratic term in Fibre:

The model

\begin{displaymath}Y_i = \beta_0+\beta_1 S_i + \beta_3 F_i + \beta_4 F_i^2 +\epsilon_i\end{displaymath}

has ESS=87.083 on 14 degrees of freedom while the model

\begin{displaymath}Y_i = \beta_0+\beta_1 S_i +\beta_3 F_i +\epsilon_i\end{displaymath}

has ESS 108.861 on 15 degrees of freedom. The F statistic is 12*(108.861-87.083)/87.083 =3.00 with a corresponding P-value of roughly 0.10. Thus the evidence that a quadratic term is needed is weak.


next up previous



Richard Lockhart
1999-01-13