next up previous


Postscript version of these notes

STAT 350: Lecture 18

Reading:

Summary of Distribution theory conclusions

1.
$\epsilon^T Q\epsilon/\sigma^2$ has the same distribution as $\sum \lambda_1 Z_i^2$ where the Zi are iid N(0,1) random variables (so the Zi2 are iid $\chi^2_1$) and the $\lambda_i$ are the eigenvalues of Q.

2.
Q2=Q (Q is idempotent) implies that all the eigenvalues of Qare either 0 or 1.

3.
Points 1 and 2 prove that Q2=Q implies that $\epsilon^T
Q\epsilon/\sigma^2 \sim \chi^2_{{\rm trace}(Q)}$.

4.
A special case is

\begin{displaymath}\frac{\hat\epsilon^T\hat\epsilon}{\sigma^2} \sim \chi^2_{n-p}
\end{displaymath}

5.
t statistics have t distributions.

6.
If $H_o: \beta=0$ is true then

\begin{displaymath}F = \frac{(\hat\mu^T\hat\mu)/p}{\hat\epsilon^T\hat\epsilon/(n-p)} \sim
F_{p,n-p}
\end{displaymath}

Many extensions of this theory are possible. The most important of these are:

1.
If a ``reduced'' model is obtained from a ``full'' model by imposing k linearly independent linear restrictions on $\beta$ (like $\beta_1=\beta_2$, $\beta_1+\beta_2=2\beta_3$) then

\begin{displaymath}\mbox{Extra SS} = \frac{{\rm ESS}_R-{\rm ESS}_F}{\sigma^2} \sim \chi_k^2
\end{displaymath}

assuming that the null hypothesis (the restricted model) is true. So the Extra Sum of Squares F test has an F-distribution.

2.
In ANOVA tables which add up the various rows (not including the total) are independent.

3.
When the null hypothesis Ho is not true the distribution of the Regression SS is Non-central $\chi^2$. This is used in power and sample size calculations.

Experimental Designs leading to multiple regression analysis

1.
(Randomized) designed experiments:

Example:

2.
Randomized Block Designs

Example

Example

3.
Observational Studies

Example:

Vital Distinction

I am now going to illustrate many of the techniques we are developing in this course with an extended example. I will be using a data set from the textbook. The example will last for several lectures.

The SCENIC data set

The data set consists of a sample of 113 hospitals selected by some means which we are not told. We appear to have a purely observational study. For each hospital we have the values of the following variables:

The data set is described in the Appendix of the text. Here I reproduce a page of pair-wise scatter plots for all variables except the categorical variables Region and School.

It is evident from the plot that, as expected, several of the variables are quite highly correlated. Here is the correlation matrix:

  Stay Age Risk Culture Chest Beds Census Nurses Facilities
Stay 1.00 0.19 0.53 0.33 0.38 -0.49 0.47 0.34 0.36
Age 0.19 1.00 0.00 -0.23 -0.02 -0.02 -0.05 -0.08 -0.04
Risk 0.53 0.00 1.00 0.56 0.45 -0.19 0.38 0.39 0.41
Culture 0.33 -0.23 0.56 1.00 0.42 -0.31 0.14 0.20 0.19
Chest 0.38 -0.02 0.45 0.42 1.00 -0.30 0.06 0.08 0.11
Beds 0.41 -0.06 0.36 0.14 0.05 -0.11 0.98 0.92 0.79
Census 0.47 -0.05 0.38 0.14 0.06 -0.15 1.00 0.91 0.78
Nurses 0.34 -0.08 0.39 0.20 0.08 -0.11 0.91 1.00 0.78
Facilities 0.36 -0.04 0.41 0.19 0.11 -0.21 0.78 0.78 1.00


next up previous



Richard Lockhart
1999-02-17