Reading: Chapter 7 Sections 1-3, 7 and 8.
Example:
Model:
Questions:
To answer these questions we test
Technique: we fit a sequence of models:
The design matrices for the models a, b, c, d and e are given by
We can compare two models easily if one is a special case of the other, such as for example, when the design matrix of the first model is a submatrix of the second obtained by selecting subcolumns.
For example model (b) is a special case of (a), model (c) is a special case of (a) or (b) but models (c) and (d) are not comparable.
We consider the case of a design matrix X partitioned into two pieces
X1 and X2.
Dimensions:
has p parameters.
has pi parameters with p1 + p2 = p.
To test
we fit both the full and the reduced
models and get
where the subscript F refers to the full model and R to the reduced
model. This leads to the decomposition of the data vector Y into the sum
of three perpendicular vectors:
Source | Sum of Squares | Degrees of Freedom |
X2 | p1 | |
X2|X1 | p2 | |
Error | n-p | |
Total (Unadjusted) | ||Y||2 | n |
In this table the notation X2|X1 means X2 adjusted for X1 or X2 after fitting X1.
This table can now be used to test
by
computing
recall that
Remarks:
In the data set below the hardness of plaster is measured for each of 9 combinations of sand content and fibre content. Sand content was set at one of 3 levels as was fibre content and all possible combinations tried on two batches of plaster.
Here is an excerpt of the data:
Sand Fibre Hardness Strength 0 0 61 34 0 0 63 16 15 0 67 36 15 0 69 19 30 0 65 28 ...The complete data set is here.
I fit submodels of the following "Full" model:
Model for | Error Sum of Squares | Error df |
Full | 81.264 | 12 |
82.389 | 13 | |
104.167 | 14 | |
169.500 | 15 | |
174.194 | 16 | |
87.083 | 14 | |
189.167 | 15 | |
210.944 | 16 | |
108.861 | 15 |
I begin by asking whether the 2nd degree polynomial terms, that is, those involving and need be included. To do so I compare the top line with the model containing only . The extra SS is 108.861-81.264 on 3 degrees of freedom which gives a mean square of (108.861-81.264)/3= 9.199. The MSE is 81.264/12 = 6.772. This gives an F-statistic of 9.199/6.772=1.358 on 3 numerator and 12 denominator degrees of freedom. This gives a P-value of 0.30 which is not significant. We would then delete the quadratic terms and consider the coefficients of S and F. We have a choice between pretending that the last line in the table is now the "Full" model and forming the F-statistics (210.944-108.861)/(108.861/15) = 14.066 and (174.194-108.861)/(108.861/15) = 9.002. The first is for testing and the second for . Each is on 1 and 15 degrees of freedom. The corresponding P-values are 0.002 and 0.009. This are both highly significant and we conclude that both Sand content and Fibre content have an impact on hardness and that there is little reason to look for non-linear impacts of the the two factors.
An alternative starting point would be to check first to see if the interaction terms could be eliminated, that is, test the hypothesis that . This hypothesis can be tested either using the Fstatistic [(82.389-81.264)/1]/[12.264/12] = 0.166 or using the t-statistic which is and which SAS calculates to be -0.41. Note that (-0.41)2 = 0.166 to within round-off error. Algebraically F=t2. Note, too, that the t test can be made one-sided while the F-test cannot.