next up previous


Postscript version of this file

STAT 870 Lecture 4

Goals of Today's Lecture:

Expected Value

Undergraduate definition of E: integral for absolutely continuous X, sum for discrete. But: $\exists$rvs which are neither absolutely continuous nor discrete.

General definition of E.

A random variable X is simple if we can write

\begin{displaymath}X(\omega)= \sum_1^n a_i 1(\omega\in A_i)
\end{displaymath}

for some constants $a_1,\ldots,a_n$ and events Ai.

Def'n: For a simple rv X we define

\begin{displaymath}E(X) = \sum a_i P(A_i)
\end{displaymath}

For positive random variables which are not simple we extend our definition by approximation:

Def'n: If $X \ge 0$ (almost surely, $P(X\ge 0) = 1$) then

\begin{displaymath}E(X) = \sup\{E(Y): 0 \le Y \le X, Y \mbox{ simple}\}
\end{displaymath}

Def'n: We call X integrable if

\begin{displaymath}E(\vert X\vert) < \infty \, .
\end{displaymath}

In this case we define

\begin{displaymath}E(X) = E(\max(X,0)) -E(\max(-X,0))
\end{displaymath}

Facts: E is a linear, monotone, positive operator:

1.
Linear: E(aX+bY) = aE(X)+bE(Y) provided X and Y are integrable.

2.
Positive: $P(X\ge 0) = 1$ implies $E(X) \ge 0$.

3.
Monotone: $P(X \ge Y)=1$ and X, Y integrable implies $E(X) \ge E(Y)$.

Major technical theorems:

Monotone Convergence: If $ 0 \le X_1 \le X_2 \le \cdots$ a.s. and $X= \lim X_n$ (which exists a.s.) then

\begin{displaymath}E(X) = \lim_{n\to \infty} E(X_n)
\end{displaymath}

Dominated Convergence: If $\vert X_n\vert \le Y_n$ and $\exists$rv X st $X_n \to X$ a.s.  and rv Y st $Y_n \to Y$ with $E(Y_n) \to E(Y) < \infty$ then

\begin{displaymath}E(X_n) \to E(X)
\end{displaymath}

Often used with all Yn the same rv Y.

Fatou's Lemma: If $X_n \ge 0$ then

\begin{displaymath}E(\lim\sup X_n) \le \lim\sup E(X_n)
\end{displaymath}

Theorem: With this definition of E if X has density f(x) (even in ${\Bbb R}^p$ say) and Y=g(X) then

\begin{displaymath}E(Y) = \int g(x) f(x) dx \, .
\end{displaymath}

(This could be a multiple integral.) If X has pmf f then

\begin{displaymath}E(Y) =\sum_x g(x) f(x) \, .
\end{displaymath}

Works even if X has density but Y doesn't.

Def'n: $r^{\rm th}$ moment (about origin) of a real rv X is $\mu_r^\prime=E(X^r)$ (provided it exists). Generally use $\mu$ for E(X). The $r^{\rm th}$ central moment is

\begin{displaymath}\mu_r = E[(X-\mu)^r]
\end{displaymath}

Call $\sigma^2 = \mu_2$ the variance.

Def'n: For an ${\Bbb R}^p$ valued rv X $\mu_X = E(X) $ is the vector whose $i^{\rm th}$ entry is E(Xi)(provided all entries exist).

Def'n: The ( $p \times p$) variance covariance matrix of X is

\begin{displaymath}Var(X) = E\left[ (X-\mu)(X-\mu)^t \right]
\end{displaymath}

which exists provided each component Xi has a finite second moment. More generally if $X \in {\Bbb R}^p$ and $Y \in {\Bbb R}^q$ both have all components with finite second moments then

\begin{displaymath}\text{Cov}(X,Y) = \text{E}\left[(X - \mu_X) (Y - \mu_Y)^T\right]
\end{displaymath}

We have

\begin{displaymath}\text{Cov}(AX+b,CY+d) = A\text{Cov}(X,Y) B^T
\end{displaymath}

for general (conforming) matrices A, C and vectors b and d.

Moments and probabilities of rare events are closely connected as will be seen in a number of important probability theorems. Here is one version of Markov's inequality (one case is Chebyshev's inequality):
\begin{align*}P(\vert X-\mu\vert \ge t ) &= E[1(\vert X-\mu\vert \ge t)]
\\
&\...
...-\mu\vert \ge t)\right]
\\
& \le \frac{E[\vert X-\mu\vert^r]}{t^r}
\end{align*}
The intuition is that if moments are small then large deviations from average are unlikely.

Moments and independence

Theorem: If $X_1,\ldots,X_p$ are independent and each Xi is integrable then $X=X_1\cdots X_p$ is integrable and

\begin{displaymath}E(X_1\cdots X_p) = E(X_1) \cdots E(X_p)
\end{displaymath}

Proof: Usual order: simple Xs first, then positive, then integrable.

Suppose each Xi is simple:

\begin{displaymath}X_i = \sum_j x_{ij} 1(X_i
=x_{ij})
\end{displaymath}

where the xij are the possible values of Xi. Then
\begin{multline*}E(X_1\cdots X_p)
\\
\begin{split}
& = \sum_{j_1\ldots j_p} x_...
...x_{pj_p} P(X_p = x_{pj_p})\right]
\\
&= \prod E(X_i)
\end{split}\end{multline*}

General Xi>0: Xi,n is Xi rounded down to the nearest multiple of 2-n (to a maximum of n). Each Xi,n is simple and $X_{1,n},\ldots,X_{p,n}$are independent. Thus

\begin{displaymath}{\rm E}(\prod X_{j,n}) = \prod {\rm E}(X_{j,n})
\end{displaymath}

for each n. If

\begin{displaymath}X_n^* = \prod X_{j,n}
\end{displaymath}

then

\begin{displaymath}0 \le X_1^* \le X_2^* \le \cdots
\end{displaymath}

and Xn* converges to $X^* = \prod X_i$so that

\begin{displaymath}{\rm E} ( X^*) = \lim {\rm E}(X_n^*)
\end{displaymath}

by monotone convergence. Also by monotone convergence

\begin{displaymath}\lim \prod {\rm E}(X_{j,n}) = \prod {\rm E}(X_j) ) < \infty
\end{displaymath}

This shows both that X* is integrable and that

\begin{displaymath}E(\prod X_j) = \prod {\rm E} (X_j)
\end{displaymath}

The general case uses the fact that we can write each Xi as the difference of its positive and negative parts:

\begin{displaymath}X_i = \max(X_i,0) -\max(-X_i,0)
\end{displaymath}

Just expand out the product and use the previous case.

Lebesgue Integration

Lebesgue integral defined much the same way as E.

Borel function f simple if

\begin{displaymath}f(x) = \sum_1^n a_i 1(x \in A_i)
\end{displaymath}

for almost all $x\in {\Bbb R}^p$ and some constants ai and Borel sets Ai with $\lambda(A_i) < \infty)$. For such an f we define

\begin{displaymath}\int f(x) dx = \sum a_i \lambda(A_i)
\end{displaymath}

Again if

\begin{displaymath}\sum a_i 1_{A_i} = \sum b_j 1_{B_j}
\end{displaymath}

almost everywhere and all Ai and Bj have finite Lebesgue measure you must check that

\begin{displaymath}\sum a_i \lambda(A_i) = \sum b_j \lambda(B_j)
\end{displaymath}

If $f \ge 0$ almost everywhere and f is Borel define

\begin{displaymath}\int f(x) dx = \sup\{ \int g(y) dy\}
\end{displaymath}

where the sup ranges over all simple functions g such that $0 \le g(x) \le f(x)$ for almost all x. Call $f \ge 0$ integrable if $\int f(x) dx< \infty$.

Call a general f integrable if |f| is integrable and define for integrable f
\begin{multline*}\int f(x) dx = \int \max(f(x),0) dx
\\
-
\int \max(-f(x),0) dx
\end{multline*}

Remark: Again you must check that you have not changed the definition of f for either of the previous categories of f.

Facts: $\int$ is a linear, monotone, positive operator:

1.
Linear: provided f and g are integrable

\begin{displaymath}\int af(x)+bg(x) dx = a\int f(x) dx
+b\inf g(x) dx
\end{displaymath}

2.
Positive: If $f(x) \ge 0$ almost everywhere then $\int f(x) dx
\ge 0$.

3.
Monotone: If f(x) > g(x) almost everywhere and f and gare integrable then

\begin{displaymath}\int f(x) dx \ge \int g(x) dx.
\end{displaymath}

Each of these facts is proved first for simple functions then for positive functions then for general integrable functions.

Major technical theorems:

Monotone Convergence: If $ 0 \le f_1 \le f_2 \le \cdots$ almost everywhere and $f= \lim f_n$ (which has to exist almost everywhere) then

\begin{displaymath}\int f(x) dx = \lim_{n\to \infty} f_n(x) dx
\end{displaymath}

Dominated Convergence: If $\vert f_n\vert \le g_n$ and there is a Borel function f such that $f_n(x) \to f(x)$ for almost all x and a Borel function g such that $g_n(x) \to g(x)$ with $\int g_n(x) dx \to \int g(x) dx < \infty$ then f is integrable and

\begin{displaymath}\int f_n(x) dx \to \int f(x) dx
\end{displaymath}

Fatou's Lemma: If $f_n \ge 0$ almost everywhere then

\begin{displaymath}\int \limsup f_n(x) dx \le \limsup \int f_n(x) dx
\end{displaymath}

Notice the frequent of almost all or almost everywhere in the hypotheses. In our definition of E wherever we require a property of the function $X(\omega)$ we can require it to hold only for a set of $\omega$ whose complement has probability 0. In this case we say the property holds almost surely. For instance the dominated convergence theorem is usually written:

Dominated Convergence: If $\vert X_n\vert \le Y_n$ almost surely ( often abbreviated to a.s.) and there is a random variable X such that $X_n \to X$ a.s. and a random variable Y such that $Y_n \to Y$ almost surely with $E(Y_n) \to E(Y) < \infty$ then

\begin{displaymath}E(X_n) \to E(X)
\end{displaymath}

The hypothesis of almost sure convergence can be weakened; I hope to discuss this later in the course.

Multiple Integration: Lebesgue integrals over ${\Bbb R}^p$ defined using Lebesgue measure on ${\Bbb R}^p$. Iterated integrals wrt Lebesgue measure on ${\Bbb R}^1$ give same answer.

Theorem[Tonelli]: If $f: {\Bbb R}^{p+q} \mapsto {\Bbb R}$ is Borel and $f \ge 0$ almost everywhere then for almost every $x\in {\Bbb R}^p$ the integral

\begin{displaymath}g(x) \equiv \int f(x,y) dy
\end{displaymath}

exists and

\begin{displaymath}\int g(x) dx = \int f(x,y) dx dy
\end{displaymath}

RHS denotes p+q dimensional integral defined previously.

Theorem[Fubini] If $f: {\Bbb R}^{p+q} \mapsto {\Bbb R}$ is Borel and integrable then for almost every $x\in {\Bbb R}^p$ the integral

\begin{displaymath}g(x) \equiv \int f(x,y) dy
\end{displaymath}

exists and is finite. Moreover g is integrable and

\begin{displaymath}\int g(x) dx = \int
f(x,y) dx dy \, .
\end{displaymath}


next up previous



Richard Lockhart
2000-09-26