Administrative info
HW6 out
Review
Recall that a random variable is a function from the sample space
Ω to the real numbers R. It assigns a real number to each
sample point.
For each value a that a random variable X can take on, X = a is an
event that consists of all the outcomes ω for which X(ω)
= a. This event has probability
Pr[X = a] = ∑_{ω : X(ω)=a} Pr[ω].
The events X = a partition the sample space, so they are disjoint
for distinct a and their union is the sample space.
The distribution of a random variable X is the set of probabilities
Pr[X = a] for every possible value of a. The probabilities must add
up to 1, since the X = a partition the sample space.
Recall the coin flipping game from last time. We flip a biased coin
that has probability p of heads n times, and for each heads we win
$1 and we lose $1 for each tails. We are interested in how much
total money we win.
Let's call W our winnings. If we flip the coin three times, then
W(hhh) = 3 W(hth) = 1 W(thh) = 1 W(tth) = -1
W(hht) = 1 W(htt) = -1 W(tht) = -1 W(ttt) = -3
In general, we determined that if we flip the coin n times, then
W(ω) = 2 H(ω) - n, where H(ω) is the number of
heads in ω. We abbreviate this statement as W = 2H - n.
We then demonstrated that the distribution of H is
Pr[H = i] = C(n, i) p^i (1-p)^(n-i)
for integer i, 0 <= i <= n. This is a binomial distribution with
parameters n and p, denoted by H ~ Bin(n, p).
As another example, suppose we throw n balls into m bins, uniformly
at random, with each ball thrown independently of the others. Let
B_1 be the number of balls in bin 1. What is the distribution of
B_1?
Each ball has probability 1/m of going into the first bin, and there
are n balls. Let's make an analogy with coin flipping. Let's refer
to throwing a ball as flipping a coin, and a ball going into bin 1
as a heads. Then this is the same as flipping a coin n times with a
bias of p = 1/m, so the distribution of B_1 is B_1 ~ Bin(n, 1/m).
Anytime we have an experiment with n mutually independent trials,
each of which has probability p of success, then the number of
successes X has distribution X ~ Bin(n, p).
Once we computed the distribution of H, we computed the distribution
of W = 2H - n. We have
Pr[W = j] = Pr[2H - n = j]
= Pr[H = (j+n)/2]
= C(n, (j+n)/2) p^[(j+n)/2] (1-p)^[n-(j+n)/2]
for integer (j+n)/2, 0 <= (j+n)/2 <= n. Solving for j, we get -n
<= j <= n as the range of values W can take on, but with the caveat
that it only takes on even values of n is even and odd values if n
is odd so that (j+n)/2 is an integer.
This demonstrated the value of writing one random variable in terms
of others. Computing the distribution of W directly would have been
more difficult than computing it from H.
Another example of writing a random variable in terms of others is
with indicator random variables. Recall the exam example from last
time. We had n students, each of who receives a random exam back. We
were interested in how many students get their own exam back.
Calling this random variable X, we then defined X_i to be an
indicator random variable that is 1 if the ith student gets his or
her own exam back and 0 otherwise. Then X = X_1 + ... + X_n.
In the case of n = 3, we had
(1,2,3) (1,3,2) (2,1,3) (2,3,1) (3,1,2) (3,2,1),
and the values of the X_i were
1 1 0 0 0 0 X_1
1 0 0 0 0 1 X_2
1 0 1 0 0 0 X_3,
so the values of X were
3 1 1 0 0 1 X.
Expectation
Sometimes we are interested in less information than the entire
distribution of a random variable. In particular, we may be
interested in an "average" value. For example, if I play the coin
flipping game many times, I want to know how much I'd win on average
in each game.
Let's review the distribution of W_f, the amount won, when the game
has 3 flips and uses a fair coin.
Pr[W_f=3] = 1/8
Pr[W_f=1] = 3/8
Pr[W_f=-1] = 3/8
Pr[W_f=-3] = 1/8
What is the "average" value? It looks like it should be 0.
What if we were using a coin with bias p? Then the amount won W_p
has distribution
Pr[W_p=3] = p^3
Pr[W_p=1] = 3 p^2 (1-p)
Pr[W_p=-1] = 3 p (1-p)^2
Pr[W_p=-3] = (1-p)^3
Now what is the "average" value? Now it's not obvious what it is.
Formally, we define the "expected value" of a random variable X to
be
E(X) = ∑_{ω ∈ Ω} X(ω) Pr[ω].
Each sample point ω contributes its value of the random
variable X(ω) to the expected value according to how likely
the outcome is.
The expected value is also known as the "expectation," "average," or
"mean."
Since X(ω) = a for any outcome in X = a, Pr[X = a] =
∑_{ω : X(ω)=a} Pr[ω], and the X = a partition
the sample space, we can equivalently write
E(X) = ∑_{a ∈ A} a * Pr[X = a],
where A is the set of all values that X can take on.
Now that we have a formal definition for expectation, let us determine
the expectation of W_f and W_p. For W_f, we have
E(W_f) = 3 Pr[W_f=3] + 1 Pr[W_f=1] - 1 Pr[W_f=-1] - 3 Pr[W_f=-3]
= 3 * 1/8 + 1 * 3/8 - 1 * 3/8 - 3 * 1/8
= 0,
as expected. For W_p, we have
E(W_p) = 3 Pr[W_p=3] + 1 Pr[W_p=1] - 1 Pr[W_p=-1] - 3 Pr[W_p=-3]
= 3 * p^3 + 1 * 3 p^2 (1-p) - 1 * 3 p (1-p)^2 - 3 * (1-p)^3
= 3p^3 + 3p^2 - 3p^3 - 3p + 6p^2 - 3p^3 - 3 + 9p - 9p^2 + 3p^3
= 6p - 3.
In the example of passing back exams, we had
(1,2,3) (1,3,2) (2,1,3) (2,3,1) (3,1,2) (3,2,1)
3 1 1 0 0 1 X.
What is E(X)? Using the first definition, we get
E(X) = 3 Pr[(1,2,3)] + 1 Pr[(1,3,2)] + ...
= 3 * 1/6 + 1 * 1/6 + 1 * 1/6 + 0 * 1/6 + 0 * 1/6 * 1 * 1/6
= 1.
We can also compute this using the distribution:
Pr[X=3] = 1/6
Pr[X=1] = 1/2
Pr[X=0] = 1/3
E(X) = 3 Pr[X=3] + 1 Pr[X=1] + 0 Pr[X=0]
= 3/6 + 1/2
= 1.
Suppose we play a game of roulette. There are 38 slots on a roulette
wheel, 18 of which are black, 18 of which are red, and two of which
are green. If we bet $1 on black (which we always do), how much do
we expect to win in a single game?
Let W be our winnings. If black comes up, we win $1, otherwise we
lose $1. So then
Pr[W=1] = 18/38 = 9/19
Pr[W=-1] = 20/38 = 10/19
E(W) = 1 Pr[W=1] - 1 Pr[W=-1]
= 9/19 - 10/19
= -1/19.
Suppose we roll a fair die. What is the expected value of the
resulting number? Let N be a random variable corresponding to the
number. Then
Pr[N=i] = 1/6 for 1 <= i <= 6, and
E(X) = 1 Pr[N=1] + 2 Pr[N=2] + ... + 6 Pr[N=6]
= 1 * 1/6 + 2 * 1/6 + ... + 6 * 1/6
= 1/6 (1 + 2 + ... + 6)
= 1/6 * 6 * 7 / 2
= 7/2.
Note that this isn't actually a value that N can take on, but it is
the expectation or average.
Suppose we roll two fair dice. Let S be the sum of the values on the
two dice. What is E(S)? We can compute the distribution
Pr[S=2] = 1/36
Pr[S=3] = 1/18
Pr[S=4] = 1/12
...
Pr[S=12] = 1/36.
This is quite tedious, and at the end, we get E(S) = 7 = 2 E(N). It
seems like there should be a simpler way to arrive at this.
As a final example, suppose we pick 100 Californians at uniformly at
random. How many Democrats do we expect out of this group, given
that 44.5\% of Californians are Democrat? Intuitively, we'd expect
44.5, but how can we arrive at that without computing a large
distribution?