Fitting the I part is easy we simply difference d times. The same observation applies to seasonal multiplicative model. Thus to fit an ARIMA(p,d,q) model to X you compute Y =(I-B)d X (shortening your data set by d observations) and then you fit an ARMA(p,q) model to Y. So we assume that d=0.
Simplest case: fitting the AR(1) model
Our basic strategy will be:
Generally the full likelihood is rather complicated; we will use conditional likelihoods and ad hoc estimates of some parameters to simplify the situation.
If the errors
are normal then so is the series X. In general
the vector
has a
where
and
is a vector all of whose entries are
.
The joint density of X is
It is possible to carry out full maximum likelihood by maximizing the quantity in question numerically. In general this is hard, however.
Here I indicate some standard tactics. In your homework I will be asking you to carry through this analysis for one particular model.
Consider the model
Now compute
To find you now plug and into (getting the so called profile likelihood ) and maximize over . Having thus found the mles of and are simply and .
It is worth observing that fitted residuals can then be calculated:
In general, we simplify the maximum likelihood problem several ways:
In the AR(1) case Y is just
while
Z is X0. We take our conditional log-likelihood to be
Notice that we have made a great many suggestions for simplifications and adjustments. This is typical of statistical research - many ideas, only slightly different from each other, are suggested and compared. In practice it seems likely that there is very little difference between all the methods. I am asking you in a homework problem to investigate the differences between several of these methods on a single data set.
For the model
An alternative to estimating
by
is to define
and then recognize that
Notice that if we put
To compute a full mle of you generally begin by finding preliminary estimates say by one of the conditional likelihood methods above and then iterate via Newton-Raphson or some other scheme for numerical maximization of the log-likelihood.
Here we consider the model with known mean (generally this
will mean we estimate
and subtract the mean
from all the observations):
In general X has a
distribution and, letting
denote the vector of bis we find
Notice that
Now imagine that the data were actually
Method A: Put
since 0 is
the most probable value and maximize
Method B: Backcasting is the process of
guessing
on the basis of the data; we replace
in the log likelihood by
We will use the EM algorithm to solve this problem.