Reading: Chapter 3, Chapter 9.
Model Assessment and Residual Analysis:
Recall that
Critique: when the model is wrong a bad data point can inflate leaving the internally studentized residual small.
Suggestion:
But:
must be compared to other residuals or
to
so we suggest Externally Studentized Residuals which are
also called Case Deleted Residuals:
Apparent problem: If n=100 do I have to run SAS 100 times? NO.
FACT 1:
FACT 2:
Cubic Fit:
Year | Internally | PRESS | Externally | Leverage | |
Studentized | Studentized | ||||
1975 | 1.17 | 0.59 | 1.54 | 0.56 | 0.24 |
1980 | -1.09 | -1.15 | -6.20 | -1.19 | 0.82 |
Note the influence of the leverage.
Note that edge observations (1980) have large leverage.
Quintic Fit:
Year | Internally | PRESS | Externally | Leverage | |
Studentized | Studentized | ||||
1978 | 0.82 | 1.79 | 1.43 | 3.48 | 0.43 |
1980 | 0.08 | 1.02 | 4.79 | 1.03 | 0.98 |
Notice 1978 residual is unacceptably large.
Notice 1980 leverage is huge.
Suppose X is the design matrix of a linear model and that X1 is the design matrix of the linear model we get by imposing some linear restrictions on the model using X. A good example is on assignment 3 but here is another. Consider the one way layout, also called the K sample problem. We have K samples from K populations with means for . Suppose the ith sample size is n+i. This is a linear model, provided we are able to assume that all the population variances are the same.
The resulting design matrix is
It is actually quite common to reparametrize the full model in such a way that the null hypothesis of interest is of the form . For the 1 way ANOVA there are two such reparametrizations in common use.
The first of these defines a grand mean parameter and individual ``effects'' . This new model has K+1 parameters apparently and the corresponding design matrix, X1, would not have full rank; its rank would be K although it would have K+1 columns. As such the matrix X1T X1 would be singular and we could not find unique least squares estimates. The problem is that we have defined the parameters in such a way that there is a linear restriction on them, namely, . We get around this problem by dropping and remembering in our model equations that .
If you now write out the model equations with
and the
as parameters you get the design matrix
Students will have seen this matrix in 330 in the case where all the ni are the same and the fractions in the last nK rows of X2 are all equal to -1. Notice that the hypothesis is the same as .
The other reparametrization is ``corner-point coding'' where we define
new parameters by
and
.
For
this parameterization the
null hypothesis of interest is
.
The
design matrix is