STAT 350: 97-1
Final Exam, 8 April 1997Instructor: Richard Lockhart
Instructions: This is an open book test.
You may use notes, text, other books
and a calculator. Your presentations of statistical analysis will be
marked for clarity of explanation. I expect you to explain what
assumptions you are making and to comment if those assumptions seem
unreasonable. The exam is out of 60.
- 1.
- When a weight is hung from a wire, the wire stretches (returning
to its original length when the weight is removed). A 1 kilogram weight is
hung from a piece of wire and the length stretched is measured. This is
repeated and the two resulting lengths are L1,1 and L1,2. Then
a 2 kilogram weight is tried 3 times resulting in lengths L2,1,
L2,2 and L2,3. To save analysis effort the experimenter averages
the two measurements made with the 1 kilogram weight, obtaining
Y1=(L1,1 + L1,2)/2 and the 3 measurements made with the two
kilogram weight, obtaining Y2. [Total of 20 marks]
- (a)
- Assume that the individual lengths satisfy, for i from 1 to 2 and
j from 1 to 2 (for i=1) or 1 to 3 (for i=2),
where the errors
are independent normal variables
and have mean 0 and
variance .
What is the design matrix for this linear model? [2 marks]
Solution:
- (b)
- Give an explicit, simple, formula for the least squares estimate
of ;
I do not want a general formula such as
(XTX)-1XTY.
[4 marks]
Solution:
and
XTY = L1,1+L1.2+2L2,1+2L2,2+2L2,3
so that
- (c)
- Give the mean and variance of the estimator in (b). [2 marks]
Solution:
and
- (d)
- The average measurements Yi also satisfy a linear model
- i.
- What is
in terms of ?
[1 marks]
Solution:
- ii.
- What is the joint distribution of
?
In particular what are the variances and means of each
?
[3 marks]
Solution:
so that
has a multivariate normal distribution
with mean 0 and variance covariance matrix
- iii.
- What is the design matrix of this linear model?
[1 marks]
Solution:
- (e)
- Show that the weighted least squares estimate of
is
[4 marks]
Solution:
The variances of the errors are
and
so that
the weights are w1=2 and w2=3. Then
and
XaTW Y = 2Y1+6Y2
so that
- (f)
- What is the distribution of
.
[2 marks]
Solution:
Normal with mean
and variance
.
- (g)
- Why would analysis of the original variables Li,j be better
than analysis of the Yi? [1 mark]
Solution:
We would have 4 degrees of freedom for error rather than 1.
- 2.
- A variable Y (a measurement of oxygen taken up by a system) is
regressed on 4 predictors
.
A total of 20 measurements were
made and Y was regressed on various subsets of the predictor variables
leading to the following table of Error Sums of Squares.
Vars |
ESS |
Vars |
ESS |
Vars |
ESS |
Vars |
ESS |
X1 |
154 |
X1,X2 |
109 |
X2,X4 |
133 |
X1,X3,X4 |
139 |
X2 |
156 |
X1,X3 |
144 |
X3,X4 |
175 |
X2,X3,X4 |
132 |
X3 |
203 |
X1,X4 |
146 |
X1,X2,X3 |
106 |
All |
104 |
X4 |
250 |
X2,X3 |
150 |
X1,X2,X4 |
107 |
None |
506 |
- (a)
- Does adding the variables X3 and X4 to the model containing
X1 and X2 significantly improve the fit? [6 marks]
Solution: This compares the model with all variables in to the model with just X1 and X2 and
so
This is much less than 1 so the added variables are not significant.
- (b)
- Use Backwards selection with a 10% significance level to stay
to select a suitable subset of regression variables. [8 marks]
Solution: We begin with all variables. Among the 3 variable models the
model containing only X1, X2 and X3 has the smallest error sum of squares
so if we delete a variable it must be X4. The F statistic is
so we delete X4. Among the two variable models which contain 2 of the variables
X1, X2 and X3 the model containing X1 and X2 has the smallest
error sum of squares so we try to delete X3 getting
which is still far from significant. We delete X3 and look at 1 variable
models which either use X1 or X2. The smallest error SS is for X1 so we
try to delete X2 getting
We compare this to the F tables with 1 numerator and 17 denominator degrees
of freedom and see that
so that X2 and X1 will be retained.
- (c)
- If the estimated slope associated with X1
in the model including X1 and X2 only as predictors
is positive what is the value of the t statistics for testing
the hypothesis that the true coefficient of X1 is 0? [1 mark]
Solution:
.
- 3.
- Five different treatments, A, B, C, D and E, are to be examined for
their effect on blood pressure. Fifty patients are randomly split into
5 groups of 10. The initial blood pressure X of each patient is measured,
the treatment is applied and then the final blood pressure Y is measured.
Let
label the treatment and j running
from 1 to 10 label the patient within the treatment group.
Three models were fitted:
Model I
the error sum of squares is 85355 and the estimates are
|
|
37.26 |
0.65 |
Model II
the error sum of squares is 66115 and the estimates are
|
|
|
|
|
|
14.2424 |
67.5325 |
48.3918 |
49.6033 |
68.7786 |
0.5509 |
For this model
Model III
the error sum of squares is 62433 and the estimates are
|
|
|
|
|
|
52.04954 |
-68.05918 |
62.48453 |
46.66416 |
112.529 |
|
|
|
|
|
|
|
0.2309892 |
1.619385 |
0.4313726 |
0.5757114 |
0.1818949 |
|
- (a)
- Of the three models, based on the information available to you,
which model provides the best fit to the data. [10 marks]
Solution: Testing Model III vs Model II we get
which is not significant. Thus Model II is preferred to Model III. Comparing Model
II to Model I we have
which leads to a P-value around 0.03 so that Model II is preferred to Model I.
- (b)
- There are 10 possible comparisons between pairs of treatments. It is desired to
give simultaneous 95% confidence intervals for all possible comparisons based
based on the second model above. I want you to show clearly that you know how to get these
ten confidence intervals. Your answer will include a clear description of the parameters
for which intervals are needed, written in terms of the notation used above for the second
model and the resulting confidence interval for the difference between treatment A
and treatment B with all the numbers filled in. You need not work it out to the point of
a numerical value for the lower and upper limit. [5 marks]
Solution:
I want confidence intervals for the 10 values of
with i<j. To get
simultaneous 95% confidence intervals you divide
by 10 and just work
out ordinary 99.5% confidence intervals. The t multiplier is around 2.96. z
You also need a standard error for
which is
the square root of
You estimate
using 66115/44 and get
- (c)
- Examine the residual plots attached for the three models.
Is there anything wrong with our fit?
If so suggest what you might try next. Be quite clear. [5 marks]
Solution:
The plots show clear signs of heteroscedasticity; a transformation might
be useful. (In fact taking logs is the thing to do.)
- (d)
- I attach a table of regression diagnostics for the fit to model II above.
For each diagnostic review the values and comment on whether or not they show
any problems and which cases might warrant further examination. [5 marks]
Solution:
I just wanted people to compare the various statistics to the guidelines in
the text. For the externally studentized residuals I was looking for some
mention of the Bonferroni adjustment. Cases 15 and 44 stand out as worth
looking at again.
Diagnostics for Model II for Question 3
|
|
Ext'ly |
|
|
|
|
Ext'ly |
|
|
Obs |
hii |
Stud'zed |
DFFITS |
Cooks |
Obs |
hii |
Stud'zed |
DFFITS |
Cooks |
# |
|
Residual |
|
Di |
# |
|
Residual |
|
Di |
1 |
0.120 |
-0.777 |
-0.287 |
0.014 |
26 |
0.100 |
-0.096 |
-0.032 |
0.000 |
2 |
0.108 |
0.407 |
0.142 |
0.003 |
27 |
0.100 |
2.018 |
0.674 |
0.071 |
3 |
0.129 |
0.047 |
0.018 |
0.000 |
28 |
0.158 |
0.768 |
0.333 |
0.019 |
4 |
0.103 |
0.868 |
0.295 |
0.015 |
29 |
0.104 |
-0.475 |
-0.162 |
0.004 |
5 |
0.101 |
0.141 |
0.047 |
0.000 |
30 |
0.106 |
-0.997 |
-0.343 |
0.020 |
6 |
0.124 |
-0.377 |
-0.142 |
0.003 |
31 |
0.102 |
-1.133 |
-0.383 |
0.024 |
7 |
0.102 |
0.681 |
0.229 |
0.009 |
32 |
0.144 |
-0.139 |
-0.057 |
0.001 |
8 |
0.150 |
-0.578 |
-0.243 |
0.010 |
33 |
0.154 |
-0.201 |
-0.086 |
0.001 |
9 |
0.148 |
-0.180 |
-0.075 |
0.001 |
34 |
0.103 |
1.186 |
0.401 |
0.027 |
10 |
0.127 |
-0.261 |
-0.099 |
0.002 |
35 |
0.137 |
-0.009 |
-0.004 |
0.000 |
11 |
0.121 |
1.073 |
0.398 |
0.026 |
36 |
0.134 |
0.607 |
0.238 |
0.010 |
12 |
0.100 |
-1.076 |
-0.359 |
0.021 |
37 |
0.114 |
0.184 |
0.066 |
0.001 |
13 |
0.102 |
-0.179 |
-0.060 |
0.001 |
38 |
0.101 |
0.069 |
0.023 |
0.000 |
14 |
0.130 |
0.329 |
0.127 |
0.003 |
39 |
0.109 |
0.372 |
0.130 |
0.003 |
15 |
0.106 |
3.436 |
1.186 |
0.188 |
40 |
0.101 |
-0.934 |
-0.312 |
0.016 |
16 |
0.180 |
-0.613 |
-0.288 |
0.014 |
41 |
0.115 |
-2.130 |
-0.766 |
0.091 |
17 |
0.104 |
-0.306 |
-0.104 |
0.002 |
42 |
0.146 |
-0.732 |
-0.303 |
0.015 |
18 |
0.100 |
0.516 |
0.172 |
0.005 |
43 |
0.126 |
1.295 |
0.491 |
0.040 |
19 |
0.110 |
-1.138 |
-0.401 |
0.027 |
44 |
0.101 |
3.038 |
1.016 |
0.145 |
20 |
0.110 |
-1.742 |
-0.611 |
0.059 |
45 |
0.107 |
-1.635 |
-0.565 |
0.051 |
21 |
0.117 |
0.211 |
0.076 |
0.001 |
46 |
0.148 |
1.019 |
0.425 |
0.030 |
22 |
0.130 |
0.385 |
0.149 |
0.004 |
47 |
0.115 |
0.417 |
0.150 |
0.004 |
23 |
0.152 |
-0.699 |
-0.296 |
0.015 |
48 |
0.103 |
0.105 |
0.036 |
0.000 |
24 |
0.111 |
-0.320 |
-0.113 |
0.002 |
49 |
0.143 |
-0.911 |
-0.372 |
0.023 |
25 |
0.104 |
-0.715 |
-0.243 |
0.010 |
50 |
0.142 |
-0.333 |
-0.135 |
0.003 |
Richard Lockhart
1999-03-23