No Title

STAT 350: Lecture 11

Reading: Chapter 7 Sections 1-3, 7 and 8.

F tests and the Extra Sum of Squares

Example:
$\begin{align*}Y & = \mbox{plaster hardness} \\ s & = \mbox{sand content} \\ f & = \mbox{fibre content} \end{align*}$

Model:

$\begin{displaymath}Y_i = \beta_0 + \beta_1 s_i \beta_2 s_i^2 + \beta_3 f_i + \beta_4 f_i^2 + \beta_5 s_i f_i + \epsilon_i \end{displaymath}$

In matrix form:

$\begin{displaymath}\left[ \begin{array}{c} Y_1 \\ \vdots \\ Y_n \end{array}\righ... ...egin{array}{c} \beta_0 \\ \vdots \\ \beta_5 \end{array}\right] \end{displaymath}$

Questions:

Do we need the S*F term?
Do we need the F or F² terms?
Do we need the S or S² terms?

To answer these questions we test

$H_o: \beta_5=0$
$H_o: \beta_3 = \beta_4 = 0$ (or perhaps $H_o: \beta_3 = \beta_4 =\beta_5 =0$ )
$H_o: \beta_1 =\beta_2 = 0$

Technique: we fit a sequence of models:

(a)

Original ``full'' model.

(b)

The model with no interactions:

$\begin{displaymath}\mu_i = \beta_0 + \beta_1 s_i \beta_2 s_i^2 + \beta_3 f_i + \beta_4 f_i^2 \end{displaymath}$

(c)

The Sand only model:

$\begin{displaymath}\mu_i = \beta_0 + \beta_1 s_i + \beta_2 s_i^2 \end{displaymath}$

(d)

The Fibre only model:

$\begin{displaymath}\mu_i = \beta_0+ \beta_3 f_i + \beta_4 f_i^2 \end{displaymath}$

(e)

The ``Empty'' model:

$\begin{displaymath}\mu_i =\beta_0 \end{displaymath}$

Each model corresponds to a design matrix which is a submatrix of the full design matrix:

$\begin{displaymath}Y = \left[ {\bf 1} \vert X_1 \vert X_2 \vert X_3 \right] \lef... ...3 \\ \beta_4 \\ \hline \beta_5 \end{array} \right] + \epsilon \end{displaymath}$

The design matrices for the models a, b, c, d and e are given by

$\begin{eqnarray*}X_1 & = & X \\ X_b & = & \left[{\bf 1} \vert X_1 \vert X_2 \ri... ...ft[{\bf 1} \vert X_2 \right] \\ X_e & = & \left[{\bf 1} \right] \end{eqnarray*}$

Note that 1 is a column vector of n 1s.

F tests

We can compare two models easily if one is a special case of the other, such as for example, when the design matrix of the first model is a submatrix of the second obtained by selecting subcolumns.

For example model (b) is a special case of (a), model (c) is a special case of (a) or (b) but models (c) and (d) are not comparable.

Comparing two models: General Theory

We consider the case of a design matrix X partitioned into two pieces X₁ and X₂.

X=[X₁| X₂]

The Full Model is
$\begin{align*}Y & = X\beta + \epsilon \\ & =[ X_1 \vert X_2 ] \left[ \begin{arr... ...ray}\right] + \epsilon \\ & = X_1 \beta_1 + X_2 \beta_2 + \epsilon \end{align*}$
The Reduced model is

$\begin{displaymath}Y=X_1 \beta_1 + \epsilon \end{displaymath}$

The difference between these two models is the term $X_2 \beta_2$ which is 0 is the null hypothesis $H_o: \beta_2=0$ is true.

Dimensions:

$\beta$ has p parameters.

$\beta_i$ has p_i parameters with p₁ + p₂ = p.

To test $H_o: \beta_2=0$ we fit both the full and the reduced models and get
$\begin{align*}Y & = \hat\mu_F + \hat\epsilon_F \\ & = \hat\mu_r5 + \hat\epsilon_R \end{align*}$
where the subscript F refers to the full model and R to the reduced model. This leads to the decomposition of the data vector Y into the sum of three perpendicular vectors:

$\begin{displaymath}Y=\hat\mu_R + (\hat\mu_F-\hat\mu_R) + \hat\epsilon_F \end{displaymath}$

I showed
$\begin{align*}\hat\mu_R & \perp \hat\mu_F-\hat\mu_R \\ \hat\mu_R & \perp \hat\epsilon_F \\ \hat\mu_F-\hat\mu_R & \perp \hat\epsilon_F \end{align*}$
which leads to the following ANOVA table:

Source	Sum of Squares	Degrees of Freedom
X₂	$\vert\vert\hat\mu_R\vert\vert^2$	p₁
X₂\|X₁	$\vert\vert\hat\mu_F-\hat\mu_R\vert\vert^2$	p₂
Error	$\vert\vert\hat\epsilon\vert\vert^2$	n-p
Total (Unadjusted)	\|\|Y\|\|²	n

In this table the notation X₂|X₁ means X₂ adjusted for X₁ or X₂ after fitting X₁.

This table can now be used to test $H_o: \beta_2=0$ by computing

$\begin{displaymath}F = \frac{{\rm MS}(X_2\vert X_1)}{{\rm MSE}} = \frac{\vert\ve... ...u_R\vert\vert^2/p_2}{\vert\vert\hat\epsilon\vert\vert^2/(n-p)} \end{displaymath}$

and getting a P value from the F_p₂,n-p distribution.

Another Formula for this F statistic

recall that

$\begin{displaymath}\vert\vert\hat\mu_R\vert\vert^2 + \vert\vert\hat\epsilon_R\vert\vert^2 = \vert\vert Y\vert\vert^2 \end{displaymath}$

and

$\begin{displaymath}\vert\vert\hat\mu_R\vert\vert^2 + \vert\vert\hat\mu_F-\hat\mu... ...vert\vert\hat\epsilon_F\vert\vert^2 = \vert\vert Y\vert\vert^2 \end{displaymath}$

so that

$\begin{displaymath}\vert\vert\hat\epsilon_R\vert\vert^2 = \vert\vert\hat\epsilon_F\vert\vert^2 + \vert\vert\hat\mu_F-\hat\mu_R\vert\vert^2 \end{displaymath}$

This makes

$\begin{eqnarray*}F & = & \frac{({\rm ESS}_R - {\rm ESS}_F)/p_2}{ {\rm ESS}_F / (n-p)} \\ & = & \frac{{\rm Extra SS}/p_2}{{\rm ESS}_F / (n-p)} \end{eqnarray*}$

Remarks:

1.

If the errors are normal then

$\begin{displaymath}\frac{{\rm ESS}_F}{\sigma^2} \sim \chi^2_{n-p} \end{displaymath}$

2.

If the errors are normal and $H_o: \beta_2=0$ is true then

$\begin{displaymath}\frac{{\rm Extra SS}}{\sigma^2} \sim \chi^2_{p_2} \end{displaymath}$

3.

${\rm ESS}_F$ is independent of the Extra SS.

4.

SO:

$\begin{displaymath}\frac{{\rm Extra SS}/(\sigma^2p_2)}{{\rm ESS}_F /(\sigma^2 (n-p))} = F \sim F_{p_2,n-p} \end{displaymath}$

Example of the above: Multiple Regression

In the data set below the hardness of plaster is measured for each of 9 combinations of sand content and fibre content. Sand content was set at one of 3 levels as was fibre content and all possible combinations tried on two batches of plaster.

Here is an excerpt of the data:

Sand  Fibre  Hardness  Strength 
   0    0       61       34
   0    0       63       16
  15    0       67       36
  15    0       69       19
  30    0       65       28
     ...

The complete data set is here.

I fit submodels of the following "Full" model:

$\begin{displaymath}Y_i = \beta_0+\beta_1 S_i + \beta_2 S_i^2 + \beta_3 F_i + \beta_4 F_i^2 + \beta_5 S_i F_I + \epsilon_i \end{displaymath}$

I adopt the idea that the interaction term is probably negligible unless each of S and F have some effect and that quadratic terms will probably not be present unless linear terms are present. This limits the set of potential reasonable models. I fit each of them and report the error sum of squares in the following table

Model for $\mu$	Error Sum of Squares	Error df
Full	81.264	12
$\beta_0+\beta_1 S_i + \beta_2 S_i^2 + \beta_3 F_i + \beta_4 F_i^2$	82.389	13
$\beta_0+\beta_1 S_i + \beta_2 S_i^2 + \beta_3 F_i$	104.167	14
$\beta_0+\beta_1 S_i + \beta_2 S_i^2$	169.500	15
$\beta_0+\beta_1 S_i$	174.194	16
$\beta_0+\beta_1 S_i + \beta_3 F_i + \beta_4 F_i^2$	87.083	14
$\beta_0+\beta_1 F_i + \beta_2 F_i^2$	189.167	15
$\beta_0+\beta_1 F_i$	210.944	16
$\beta_0+\beta_1 S_i + \beta_3 F_i$	108.861	15

I begin by asking whether the 2nd degree polynomial terms, that is, those involving $\beta_2, \beta_4$ and $\beta_5$ need be included. To do so I compare the top line with the model containing only $\beta_0+\beta_1 S_i + \beta_3 F_i$ . The extra SS is 108.861-81.264 on 3 degrees of freedom which gives a mean square of (108.861-81.264)/3= 9.199. The MSE is 81.264/12 = 6.772. This gives an F-statistic of 9.199/6.772=1.358 on 3 numerator and 12 denominator degrees of freedom. This gives a P-value of 0.30 which is not significant. We would then delete the quadratic terms and consider the coefficients of S and F. We have a choice between pretending that the last line in the table is now the "Full" model and forming the F-statistics (210.944-108.861)/(108.861/15) = 14.066 and (174.194-108.861)/(108.861/15) = 9.002. The first is for testing $\beta_1 =0$ and the second for $\beta_3=0$ . Each is on 1 and 15 degrees of freedom. The corresponding P-values are 0.002 and 0.009. This are both highly significant and we conclude that both Sand content and Fibre content have an impact on hardness and that there is little reason to look for non-linear impacts of the the two factors.

An alternative starting point would be to check first to see if the interaction terms could be eliminated, that is, test the hypothesis that $\beta_5=0$ . This hypothesis can be tested either using the Fstatistic [(82.389-81.264)/1]/[12.264/12] = 0.166 or using the t-statistic which is $\hat\beta_5/\hat\sigma_{\hat\beta_5}$ and which SAS calculates to be -0.41. Note that (-0.41)² = 0.166 to within round-off error. Algebraically F=t². Note, too, that the t test can be made one-sided while the F-test cannot.

Here is SAS CODE and output.

$next$ $up$ $previous$

Richard Lockhart
1999-03-23