Postscript version of these notes
STAT 350 Lecture 2
Reading: Chapter 5 sections 1-4.
Matrix form of a linear model
Stack Yi,
and
into vectors:
Define
Note
so
Finally
is our original set of n model equations written in vector matrix form.
Assumptions so far:
Still to come: independence, homoscedasticity, normality.
Examples: please take the point that this is a very large class
of models.
- 1.
- One sample problem:
-
measured under ``identical'' conditions.
- So
,
say.
-
-
(so p=1).
-
.
- 2.
- Two sample problem: n=r+s
For
For
In matrix form
Sometimes it is convenient to write:
which is a partitioned matrix where I have described the transpose of X.
- 3.
- Simple linear regression:
Yi = TL
Di = Dose
The model
gives
- 4.
- Polynomial models: ``polynomial regression''. In
Lecture 1
we had the quadratic model:
for which
In general we might fit a polynomial of degree p-1 to get
for which
- 5.
- Analysis of Covariance: fitting two straight lines
Consider the TL data again but now suppose that samples
1 to r were ``bleached'' (left in the sun for several
hours before analysis) and samples r+1 to s were
``unbleached''. We combine the 2 sample problem with the
straight line problem:
Special case: ``No interaction'' of Bleach and Dose:
the effect of dose is the same
for bleached and unbleached samples. That is:
Note: we usually re-order the parameters in this case to get
- 6.
- Weighing designs: (a simple example mostly for illustration)
Idea: weigh two objects with (true) weights
and .
Now we have
and get
so that
Notice that
is a physically
meaningful and important assumption. This sort of assumption
may well be wrong.
- 7.
- One way layout (ANOVA). Example has data Yij being
the blood coagulation time for rat number j fed diet number i
for i=1,2,3,4. There were 4 rats for diet 1, 6 for diets 2 and 3
and 8 rats fed diet 4. We use
as notation for
.
The idea is that all the rats fed diet 1 have the same mean coagulation
time
so
.
(In fact it is pretty common notation to use
for
but
this will conflict, for the time being, with my notation for the mean of
the first Y.)
If we stack up the Ys we get
Again we have
.
Jargon: X is called a ``design matrix''.
The one way layout as a linear model
The sum of squares decomposition in one example
The data consist of blood coagulation times for 24 animals
fed one of 4 different diets. Here are the data with the 4 diets
being the 4 columns.
The usual ANOVA model equation is
which we can write in matrix form by stacking up the
observations into a column.
Let X denote the
design matrix in this formula.
Usually we reparametrize the model in the form
which would lead to a design matrix which looked like X above with an
extra column on the left all of whose entries are equal to 1. The parameter vector
would now be
It will turn out that trying to use this parametrization the different parameters
are not identifiable, that is, they cannot be separately estimated, because making
bigger by a certain amount and each
smaller by the same amount leaves
the data unchanged. We usually solve this problem by defining
and
Now it is automatic
that
so we usually eliminate
by replacing it in the
model equation by the quantity
This leads to
and
Further analysis of this data.
Richard Lockhart
1999-01-12