next up previous


Postscript version of these notes

STAT 350: Lecture 14

Reading: Chapter 3, Chapter 9.

Model Assessment and Residual Analysis:

Standardized / studentized residuals

Recall that

\begin{displaymath}\hat\epsilon \sim MVN(0,\sigma^2(I-H))
\end{displaymath}

where H=X(XTX)-1 XT. It follows that

\begin{displaymath}\hat\epsilon_i \sim N(0,\sigma^2(1-h_{ii}))
\end{displaymath}

where hii is the iith diagonal entry in H. The quantity

\begin{displaymath}\frac{\hat\epsilon_i}{\hat\sigma}
\end{displaymath}

is called a standardized residual. It ignores the leverages which are small for most observations. In judging the quality of a model, however, observations with large leverages are particularly important so it is now standard to adjust for them. We sometimes use

\begin{displaymath}\frac{\hat\epsilon_i}{\hat\sigma\sqrt{1-h_{ii}}} \sim N(0,1)
\end{displaymath}

which is called an internally studentized residual.

Critique: when the model is wrong a bad data point can inflate $\hat\sigma$leaving the internally studentized residual small.

Suggestion:

\begin{displaymath}Y_i -\hat{Y}_{i(i)}
\end{displaymath}

where $\hat{Y}_{i(i)}$ is the fitted value using all the data except case i. This residual is called a ``PRESS prediction error for case i''. The acronym PRESS stands for Prediction Sum of Squares.

But: $Y_i -\hat{Y}_{i(i)}$ must be compared to other residuals or to $\sigma$ so we suggest Externally Studentized Residuals which are also called Case Deleted Residuals:

\begin{displaymath}\frac{\hat\epsilon_{i(i)}}{\mbox{est'd SE not using case $i$}...
...{Y_i -\hat{Y}_{i(i)}}{\mbox{Case $i$ deleted SE of numerator}}
\end{displaymath}

Apparent problem: If n=100 do I have to run SAS 100 times? NO.

FACT 1:

\begin{displaymath}Y_i -\hat{Y}_{i(i)} = \frac{\hat\epsilon_i}{1-h_{ii}}
\end{displaymath}

Jargon: hii is the leverage of point i. If hii is large then

\begin{displaymath}\left\vert\frac{\hat\epsilon_i}{1-h_{ii}}\right\vert >> \vert \hat\epsilon_i\vert
\end{displaymath}

and point i influences the fit strongly.

FACT 2:

\begin{displaymath}{\rm Var}\left(\frac{\hat\epsilon_i}{1-h_{ii}}\right) =
\fra...
..._{ii}} \left( = \frac{\sigma^2(1-h_{ii})}{(1-h_{ii})^2}\right)
\end{displaymath}

The Externally Standardized Residual is

\begin{displaymath}\frac{\hat\epsilon_i/(1-h_{ii})}{\sqrt{{\rm MSE}_{(i)}/(1-h_{ii})}}
=
\frac{\hat\epsilon_i}{\sqrt{{\rm MSE}_{(i)}(1-h_{ii})}}
\end{displaymath}

where

\begin{displaymath}{\rm MSE}_{(i)} = \mbox{ estimate of $\sigma^2$ not using data point $i$}
\end{displaymath}

Fact:

\begin{displaymath}{\rm MSE} = \frac{(n-p-1) {\rm MSE}_{(i)} + \hat\epsilon_i^2/(1-h_{ii})}{n-p}
\end{displaymath}

so the externally studentized residual is

\begin{displaymath}\hat\epsilon_{i}\sqrt{\frac{n-p-1}{{\rm ESS}(1-h_{ii}) - \hat\epsilon_i^2}}
\end{displaymath}

Distribution Theory of Externally Standardized Residuals

1.
$\hat\epsilon_{(i)} /\sqrt{{\rm Var}(\hat\epsilon_i)} \sim N(0,1)$

2.

\begin{displaymath}\frac{(n-p-1) {\rm MSE}_{(i)}}{\sigma^2} \sim \chi^2_{n-p-1}
\end{displaymath}

3.
These two are independent.

4.
SO:
\begin{align*}t_i & = \frac{(n-p-1) {\rm MSE}_{(i)}}{\sigma^2} \sim \chi^2_{n-p-1}
\\
&\sim t_{n-p-1}
\end{align*}

Example: Insurance Data

Cubic Fit:

Year $\hat\epsilon_i$ Internally PRESS Externally Leverage
    Studentized   Studentized  
1975 1.17 0.59 1.54 0.56 0.24
1980 -1.09 -1.15 -6.20 -1.19 0.82

Note the influence of the leverage.

Note that edge observations (1980) have large leverage.

Quintic Fit:

Year $\hat\epsilon_i$ Internally PRESS Externally Leverage
    Studentized   Studentized  
1978 0.82 1.79 1.43 3.48 0.43
1980 0.08 1.02 4.79 1.03 0.98

Notice 1978 residual is unacceptably large.

Notice 1980 leverage is huge.

Formal assessment of Externally Standardized Residuals

1.
Each residual has a tn-p-1 distribution. For example, for the quintic, t10-7,0.025 = 3.18 is the critical point for a 5% level test.

2.
But there are 10 residuals to look at. This leads to a multiple comparisons problem. The simplest multiple comparisons procedure is the Bonferoni method: divide $\alpha$ by the number of tests to be done, 10 in our case giving 0.025/10 = 0.0025. The corresponding critical point is

t3,0.0025 = 7.45

so none of the residuals are significant.

3.
For the cubic model

t5,0.0025 = 4.77

and again all the residuals are judged ok.

Reparametrization

Suppose X is the design matrix of a linear model and that X1 is the design matrix of the linear model we get by imposing some linear restrictions on the model using X. A good example is on assignment 3 but here is another. Consider the one way layout, also called the K sample problem. We have K samples from K populations with means $\nu_i$ for $i=1,\ldots,K$. Suppose the ith sample size is n+i. This is a linear model, provided we are able to assume that all the population variances are the same.

The resulting design matrix is

\begin{displaymath}X = \left[\begin{array}{rrrrr}
1 & 0 & 0 & \cdots & 0 \\
&...
...\\
& & \vdots \\
0 & 0 & 0 & \cdots & 1
\end{array}\right]
\end{displaymath}

where there are n1 copies of the first row, $[1,0,\ldots,0]$ and then n2 copies of $[0,1,0,\ldots,0]$ and so on. Under the restriction $\nu_1=\cdots=\nu_K$ there is just one parameter, say $\eta$, and the design matrix X* is just a column of $n_1+\cdots+n_K$ 1s. Notice that X* is not a submatrix of the original design matrix X. However, if A is the $K\times 1$ matrix with all entries equal to 1 we have X* = XA so that the column space of X* is a subspace (of dimension 1) of the K dimensional column space of X. The extra sum of squares principle can thus be used to test the null hypothesis $H_o: \nu_1=\cdots=\nu_K$ by fitting the two models and computing, using the notation of ANOVA from STAT 330:

\begin{displaymath}F=\frac{ [\sum (Y_{ij}-\bar{Y}_{\cdot\cdot})^2 -
\sum (Y_{i...
...t})^2]/(K-1)}{
\sum (Y_{ij}-\bar{Y}_{i\cdot})^2 /(\sum n_i-K)}
\end{displaymath}

The numerator of this sum simplifies algebraically to

\begin{displaymath}\sum_i n_i (\bar{Y}_{i\cdot}-\bar{Y}_{\cdot\cdot})^2/(K-1)
\end{displaymath}

so that this extra Sum of Squares F test is the usual F test in this problem; this is universal.

It is actually quite common to reparametrize the full model in such a way that the null hypothesis of interest is of the form $\beta_{i_1} = \beta_{i_2} = \cdots = 0$. For the 1 way ANOVA there are two such reparametrizations in common use.

The first of these defines a grand mean parameter $\bar\nu = E ( \bar{Y}_{\cdot\cdot}) = \sum n_i \nu_i / \sum n_i$and individual ``effects'' $\alpha_i = \nu_i -\bar\nu$. This new model has K+1 parameters apparently and the corresponding design matrix, X1, would not have full rank; its rank would be K although it would have K+1 columns. As such the matrix X1T X1 would be singular and we could not find unique least squares estimates. The problem is that we have defined the parameters in such a way that there is a linear restriction on them, namely, $\sum n_i\alpha_i = 0$. We get around this problem by dropping $\alpha_K$ and remembering in our model equations that $\alpha_K =-\sum_{i=1}^{K-1} n_i\alpha_i/n_K$.

If you now write out the model equations with $\bar\nu$ and the $\alpha_i$ as parameters you get the design matrix

\begin{displaymath}X_2 = \left[\begin{array}{rrrrr}
1 & 1 & 0 & \cdots & 0 \\
...
...}{n_K} & \cdots & -\frac{n_{K-1}}{n_K}
\end{array}\right] \, .
\end{displaymath}

Students will have seen this matrix in 330 in the case where all the ni are the same and the fractions in the last nK rows of X2 are all equal to -1. Notice that the hypothesis $\nu_1=\cdots=\nu_K$ is the same as $\alpha_1 = \cdots = \alpha_{K-1}=0$.

The other reparametrization is ``corner-point coding'' where we define new parameters by $\beta_1=\nu_1$ and $\beta_i = \nu_i - \nu_1$. For this parameterization the null hypothesis of interest is $\beta_2 = \cdots = \beta_K = 0$. The design matrix is

\begin{displaymath}X_3 = \left[\begin{array}{rrrrr}
1 & 0 & 0 & \cdots & 0 \\
...
... & \vdots \\
1 & 0 & 0 & \cdots & 1
\end{array}\right] \, .
\end{displaymath}


next up previous



Richard Lockhart
1999-04-09