Skip to main content

Random Variables

[See this for intro stuff](/docs/math/Probability/Expected Values).

Expected Value

  • E[X+Y]=E[X]+E[Y]E[X + Y] = E[X] + E[Y]
  • E[aX+b]=aE[X]+bE[aX + b] = aE[X] + b
    Note how the Expected Value is shifted by amount bb
  • E[E[YX]]=E[Y]E[E[Y|X]] = E[Y]
    This one’s a doozy!

Note that when you’re computing E[XY]E[XY], you’re looking at Sum(xy.fX,Y(x,Y))Sum(xy.f_{X,Y}(x,Y)), ie. the joint PDF or PMF of Random Variables XX and YY. If these are independent

Variance

  • Var(X)=E[(XE[X])2]=E[X2](E[X])2Var(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2
    The “spread” about the mean.
  • Var(aX+b)=a2Var(X)Var(aX + b) = a^2Var(X)
    Note that bb does not shift the Variance (like with Expected Value!)
  • Var(X±Y)=Var(X)+Var(Y)Var(X \plusmn Y) = Var(X) + Var(Y)
    If XX and YY are independent. Always a plus-sign because 12-1^2
  • Var(X±Y)=Var(X)+Var(Y)±2Cov(X,Y)Var(X \plusmn Y) = Var(X) + Var(Y) \plusmn 2Cov(X, Y)
    If they’re dependent
  • Var(Xi)=Var(Xi)Var(\sum{X_i}) = \sum{Var(X_i)}
  • Var(aiXi+bi)=ai2Var(Xi)Var(\sum{a_iX_i + b_i}) = \sum{a_i^2 Var(X_i)}

Why Always Add Variances?

We buy some cereal. The box says “16 ounces.” We know that’s not precisely the weight of the cereal in the box, just close. After all, one corn flake more or less would change the weight ever so slightly. Weights of such boxes of cereal vary somewhat, and our uncertainty about the exact weight is expressed by the variance (or standard deviation) of those weights.

Next we get out a bowl that holds 3 ounces of cereal and pour it full. Our pouring skill is not very precise, so the bowl now contains about 3 ounces with some variability (uncertainty).

How much cereal is left in the box? Well, we assume about 13 ounces. But notice that we’re less certain about this remaining weight than we were about the weight before we poured out the bowlful. The variability of the weight in the box has increased even though we subtracted cereal.

Moral: Every time something happens at random, whether it adds to the pile or subtracts from it, uncertainty (read “variance”) increases.

Source

Covariance

  • Cov(X,X)=Var(X)Cov(X, X) = Var(X)
  • Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]Cov(X, Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]
    Just expand it out and you’ll see.
  • Cov(X+a,Y+b)=Cov(X,Y)Cov(X + a, Y + b) = Cov(X, Y)
  • Cov(aX,bY)=abCov(X,Y)Cov(aX, bY) = abCov(X, Y)
  • Cov(aX+bY,cP+dQ)=acCov(X,Y)+adCov(X,Q)+...Cov(aX + bY, cP + dQ) = acCov(X, Y) + adCov(X, Q) + ...

Correlation

Between two variables XX and YY, it’s the Covariance of (X,Y)(X,Y) normalized by their variance1.

Corr(X,Y)=ρ=Cov(X,Y)Var(X)Var(Y),1ρ1Corr(X, Y) = \rho = \frac{Cov(X, Y)}{\sqrt{Var(X)Var(Y)}}, -1 \leq \rho \leq 1

It measures the linearity of the relationship between Random Variables (or Vectors!)

ρ(X,Y)=1    Y=aX+B,a=σy/σxρ(X,Y)=1    Y=aX+B,a=σy/σxρ(X,Y)=0    No linearity\rho(X,Y) = 1 \implies Y = aX+B, a = \sigma_y/\sigma_x \rho(X,Y) = -1 \implies Y = aX+B, a = -\sigma_y/\sigma_x \rho(X,Y) = 0 \implies \text{No linearity}
warning

A correlation of zero does not necessarily mean that XX and YY are independent!

Disjointedness and Independence

Disjoint Events

Putting all that stuff up there together. Disjointedness is also called “Mutual Exclusivity”: events have nothing in common in the Sample and Event Spaces. If one happens, the other cannot. In this case, they do not have a joint Probability Distribution at all. So,

E[XY]=0Cov(X,Y)=E[XY]E[X]E[Y]=E[X]E[Y]E[XY] = 0 Cov(X,Y) = E[XY] - E[X]E[Y] = -E[X]E[Y]

Now you can look at the PDFs or PMFs of XX and YY and expand out that equation2. But what it’s saying with that negative sign is: if XX happens, YY cannot.

Independent Events

For two independent events XX and YY,

E[XY]=E[X]E[Y]    Cov(X,Y)=E[XY]E[X]E[Y]=E[X]E[Y]E[X]E[Y]=0E[XY] = E[X]E[Y] \implies Cov(X,Y) = E[XY] - E[X]E[Y] = E[X]E[Y] - E[X]E[Y] = 0

Summary

PropertyDisjoint (AKA Mutually Exclusive)Independent
Probability - IntersectionP(XY)=0P(X \cap Y) = 0P(XY)=P(X)P(Y)P(X \cap Y) = P(X)P(Y)
Probability - ConditionalIf both have positive probability: P(XY)=0P(X \mid Y) = 0, P(YX)=0P(Y \mid X) = 0 (knowing one excludes the other)P(XY)=P(X)P(X \mid Y) = P(X), P(YX)=P(Y)P(Y \mid X) = P(Y) (knowing one gives no info)
Expectation - AdditivityE[X+Y]=E[X]+E[Y]\mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y] (always true)Same (always true)
Expectation - ProductE[XY]=0\mathbb{E}[XY] = 0E[XY]=E[X]E[Y]\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]
Variance of SumVar(X±Y)=Var(X)+Var(Y)±2Cov(X,Y)\mathrm{Var}(X \plusmn Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) \plusmn 2Cov(X,Y)Var(X+Y)=Var(X)+Var(Y)\mathrm{Var}(X+Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)
CovarianceCov(X,Y)=E[X]E[Y]\mathrm{Cov}(X,Y) = -\mathbb{E}[X]\mathbb{E}[Y]Cov(X,Y)=0\mathrm{Cov}(X,Y) = 0
Independence implied?❌ Never (unless trivial zero-probability case)✅ By definition
Disjointness implied?✅ By definition❌ Never (unless trivial zero-probability case)

Higher Order Moments

OrderName
FirstMeanxn\frac{\sum{x}}{n}
SecondVariancex2n(xμ)2n\frac{\sum{x^2}}{n} \rarr \frac{\sum{(x - \mu)^2}}{n}
ThirdSkewnessx3n(xμ)3n1n(xμ)3σ3\frac{\sum{x^3}}{n} \rarr \frac{\sum{(x - \mu)^3}}{n} \rarr \frac{1}{n}\frac{\sum{(x - \mu)^3}}{\sigma^3}
FourthKurtosisx4n(xμ)4n1n(xμ)4σ4\frac{\sum{x^4}}{n} \rarr \frac{\sum{(x - \mu)^4}}{n} \rarr \frac{1}{n}\frac{\sum{(x - \mu)^4}}{\sigma^4}

Footnotes

  1. This means we’re making the result unitless and clamping it to between [0,1][0,1]

  2. One trick for demonstration purposes is to set the Proability Function to 1A1_A. This means the “Indicator Random Variable” which means X=1X = 1 if AA happens else zero if AA doesn’t. So E[X]E[X] just becomes the probability of AA.