No Title

STAT 350 Lecture 2

Reading: Chapter 5 sections 1-4.

Matrix form of a linear model

Stack Y_i, $\mu_i$ and $\epsilon_i$ into vectors:

$\begin{displaymath}\begin{array}{ccc} Y = \left[ \begin{array}{c} Y_1 \\ Y_2 \\ ... ...silon_2 \\ \vdots \\ \epsilon_n \end{array} \right] \end{array}\end{displaymath}$

Define

$\begin{displaymath}\begin{array}{cc} \beta = \left[ \begin{array}{c} \beta_1 \\ ... ... & \cdots & x_{n,p} \end{array} \right]_{n\times p} \end{array}\end{displaymath}$

Note

$\begin{displaymath}X\beta = \left[ \begin{array}{c} x_{1,1} \beta_1 + \cdots + x... ...1} \beta_1 + \cdots + x_{n,p} \beta_p \end{array} \right] =\mu \end{displaymath}$

$\begin{displaymath}\mu=X\beta \end{displaymath}$

Finally

$\begin{displaymath}Y=X\beta + \epsilon \end{displaymath}$

is our original set of n model equations written in vector matrix form.

Assumptions so far:
$\begin{align*}& {\rm E}(\epsilon_i) = 0 \\ & Y=\mu+\epsilon \\ & \mu=X\beta \end{align*}$

Still to come: independence, homoscedasticity, normality.

Examples: please take the point that this is a very large class of models.

1.

One sample problem:

$Y_1,\ldots,Y_n$ measured under ``identical'' conditions.
So $\mu_1,\ldots,\mu_n = \beta_1$ , say.
$X = \left[ \begin{array}{c}1 \\ 1 \\ \vdots \\ 1 \end{array}\right]_{n \times 1}$
$\beta = \left[\beta_1 \right]_{1 \times 1}$ (so p=1).
$Y= \left[ \begin{array}{c}1 \\ 1 \\ \vdots \\ 1 \end{array}\right]\beta +\epsilon$ .

2.

Two sample problem: n=r+s

$\begin{displaymath}\mu_1 = \cdots = \mu_r = \beta_1 \qquad \mu_{r+1} = \cdots = \mu_{r+s} = \beta_2 \end{displaymath}$

For $i\le r$

$\begin{displaymath}Y_i = \beta_1 + \epsilon_i \qquad {\rm E}(Y_i) = \beta_1 \end{displaymath}$

For $r < i \le r+s$

$\begin{displaymath}Y_i = \beta_2 + \epsilon_i \qquad {\rm E}(Y_i) = \beta_2 \end{displaymath}$

In matrix form

$\begin{displaymath}Y = \left[ \begin{array}{cc} 1 & 0 \\ \vdots & \vdots \\ 1 &... ...gin{array}{c} \beta_1 \\ \beta_2 \end{array}\right] + \epsilon \end{displaymath}$

Sometimes it is convenient to write:

$\begin{displaymath}X^T = \left[ \overbrace{ \begin{array}{ccc} 1 & \cdots & 1 \... ... & 0 \\ 1 & \cdots & 1 \end{array}}^{s \mathrm{\ cols}}\right] \end{displaymath}$

which is a partitioned matrix where I have described the transpose of X.

3.

Simple linear regression:

Y_i = TL

D_i = Dose

The model

$\begin{displaymath}Y_i = \beta_1 + \beta_2 D_i + \epsilon_i \end{displaymath}$

gives

$\begin{displaymath}\beta=\left[\begin{array}{c} \beta_1 \\ \beta_2 \end{array}\r... ...c} 1 & D_1 \\ \vdots & \vdots \\ 1 & D_n \end{array}\right] \end{displaymath}$

4.

Polynomial models: ``polynomial regression''. In Lecture 1 we had the quadratic model:

$\begin{displaymath}Y_i = \beta_1 + D_i \beta_2 + D_i^2 \beta_3 + \epsilon_i \end{displaymath}$

for which

$\begin{displaymath}\beta=\left[\begin{array}{c} \beta_1 \\ \beta_2 \\ \beta_3 \e... ...ts & D_n \\ D_1^2 & D_2^2 & \cdots & D_n^2 \end{array}\right] \end{displaymath}$

In general we might fit a polynomial of degree p-1 to get

$\begin{displaymath}Y_i = \beta_1 + D_i \beta_2 + \cdots + D_i^{p-1} \beta_p + \epsilon_i \end{displaymath}$

for which

$\begin{displaymath}\beta=\left[\begin{array}{c} \beta_1 \\ \vdots \\ \beta_p \en... ... D_1^{p-1} & D_2^{p-1} & \cdots & D_n^{p-1} \end{array}\right] \end{displaymath}$

5.

Analysis of Covariance: fitting two straight lines

Consider the TL data again but now suppose that samples 1 to r were ``bleached'' (left in the sun for several hours before analysis) and samples r+1 to s were ``unbleached''. We combine the 2 sample problem with the straight line problem:
$\begin{align*}\mu_i &= \beta_1 + \beta_2 D_i \qquad i=1,\ldots,r \\ \mu_i &= \beta_3 + \beta_4 D_i \qquad i=r+1,\ldots,r+s \end{align*}$

$\begin{displaymath}\beta = \left[\begin{array}{c} \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \end{array}\right] \end{displaymath}$

$\begin{displaymath}X^T = \left[ \begin{array}{ccc} 1 & \cdots & 1 \\ D_1 & \cdo... ... \\ 1 & \cdots & 1 \\ D_1 & \cdots & D_r \end{array} \right] \end{displaymath}$

Special case: ``No interaction'' of Bleach and Dose: the effect of dose is the same for bleached and unbleached samples. That is:

$\begin{displaymath}\beta_2 = \beta_4 \end{displaymath}$

$\begin{displaymath}\beta = \left[\begin{array}{c} \beta_1 \\ \beta_2 \\ \beta_3 \end{array}\right] \end{displaymath}$

$\begin{displaymath}X^T = \left[ \begin{array}{ccc} 1 & \cdots & 1 \\ D_1 & \cdo... ...\\ D_1 & \cdots & D_r \\ 1 & \cdots & 1 \end{array} \right] \end{displaymath}$

Note: we usually re-order the parameters in this case to get

$\begin{displaymath}\beta = \left[\begin{array}{c} \beta_1 \\ \beta_3 \\ \beta_2 \end{array}\right] \end{displaymath}$

$\begin{displaymath}X^T = \left[ \begin{array}{ccc} 1 & \cdots & 1 \\ 0 & \cdots... ...\\ 1 & \cdots & 1 \\ D_1 & \cdots & D_r \end{array} \right] \end{displaymath}$

6.

Weighing designs: (a simple example mostly for illustration) Idea: weigh two objects with (true) weights $\beta_1$ and $\beta_2$ .
$\begin{align*}Y_1 & = \mbox{measured weight of Object 1} \\ Y_2 & = \mbox{measu... ...} \\ Y_3 & = \mbox{measured weight of Object 1 and 2 together} \\ \end{align*}$
Now we have

$\begin{displaymath}\mu_1 = \beta_1 \qquad \mu_2 = \beta_2 \qquad \mu_3 = \beta_1+\beta_2 \end{displaymath}$

and get

$\begin{displaymath}\left[ \begin{array}{c} \mu_1 \\ \mu_2 \\ \mu_3 \end{array} \... ...\left[ \begin{array}{c} \beta_1 \\ \beta_2 \end{array} \right] \end{displaymath}$

so that

$\begin{displaymath}X = \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{array} \right] \end{displaymath}$

Notice that ${\rm E}(Y_i)= \mbox{ true weight}$ is a physically meaningful and important assumption. This sort of assumption may well be wrong.

7.

One way layout (ANOVA). Example has data Y_ij being the blood coagulation time for rat number j fed diet number i for i=1,2,3,4. There were 4 rats for diet 1, 6 for diets 2 and 3 and 8 rats fed diet 4. We use $\mu_{ij}$ as notation for ${\rm E}(Y_{ij})$ . The idea is that all the rats fed diet 1 have the same mean coagulation time $\beta_1$ so $\mu_{11}=\mu_{12} =\mu_{13} = \mu_{14} = \beta_1$ . (In fact it is pretty common notation to use $\mu_1$ for $\beta_1$ but this will conflict, for the time being, with my notation for the mean of the first Y.) If we stack up the Ys we get

$\begin{displaymath}Y = \left[ \begin{array}{c} Y_{11} \\ Y_{12} \\ Y_{13} \\ Y_... ...1} \\ \beta_{2} \\ \beta_{3} \\ \beta_{4} \end{array} \right] \end{displaymath}$

Again we have $\mu = X \beta$ .

Jargon: X is called a ``design matrix''.

The one way layout as a linear model

The sum of squares decomposition in one example

The data consist of blood coagulation times for 24 animals fed one of 4 different diets. Here are the data with the 4 diets being the 4 columns.

$\begin{displaymath}\left[ \begin{array}{rrrr} 62 & 63 & 68 & 56 \\ 60 & 67 & ... ... & 66 & 68 & 64 \\ & & & 63 \\ & & & 59 \end{array}\right] \end{displaymath}$

The usual ANOVA model equation is

$\begin{displaymath}Y_{ij} = \mu_i +\epsilon_{ij} \end{displaymath}$

which we can write in matrix form by stacking up the observations into a column.

$\begin{displaymath}\left[\begin{array}{r} 62 \\ 60 \\ 63 \\ 59 \\ 63 \\ 67 ... ...n_{46} \\ \epsilon_{47} \\ \epsilon_{48} \end{array}\right] \end{displaymath}$

Let X denote the $24\times 4$ design matrix in this formula. Usually we reparametrize the model in the form

$\begin{displaymath}Y_{ij}=\mu+\alpha_i +\epsilon_{ij} \end{displaymath}$

which would lead to a design matrix which looked like X above with an extra column on the left all of whose entries are equal to 1. The parameter vector $\beta$ would now be

$\begin{displaymath}\beta^{T} = \left[\mu \quad \alpha_1 \quad \alpha_2 \quad \alpha_3 \quad \alpha_4 \quad\right] \end{displaymath}$

It will turn out that trying to use this parametrization the different parameters are not identifiable, that is, they cannot be separately estimated, because making $\mu$ bigger by a certain amount and each $\alpha_i$ smaller by the same amount leaves the data unchanged. We usually solve this problem by defining

$\begin{displaymath}\mu = (n_1\mu_1 + \cdots + n_k \mu_k)/(n_1+ \cdots +n_k) \end{displaymath}$

and

$\begin{displaymath}\alpha_i = \mu_i-\mu\, .\end{displaymath}$

Now it is automatic that $\sum\alpha_i=0$ so we usually eliminate $\alpha_4$ by replacing it in the model equation by the quantity

$\begin{displaymath}-(\alpha_1+\alpha_2+\alpha_3).\end{displaymath}$

This leads to

$\begin{displaymath}\beta^{T} = \left[\mu \quad \alpha_1 \quad \alpha_2 \quad \alpha_3 \quad\right] \end{displaymath}$

and

$\begin{displaymath}X = \left[ \begin{array}{rrrr} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & ... ... \\ 1 & -1 & -1 & -1 \\ 1 & -1 & -1 & -1 \end{array}\right] \end{displaymath}$

Further analysis of this data.

$next$ $up$ $previous$

Richard Lockhart
1999-01-12