Random Variables

[See this for intro stuff](/docs/math/Probability/Expected Values).

Expected Value

$E[X + Y] = E[X] + E[Y]$
$E[aX + b] = aE[X] + b$
Note how the Expected Value is shifted by amount $b$
$E[E[Y|X]] = E[Y]$
This one’s a doozy!

Note that when you’re computing $E[XY]$ , you’re looking at $Sum(xy.f_{X,Y}(x,Y))$ , ie. the joint PDF or PMF of Random Variables $X$ and $Y$ . If these are independent

Variance

$Var(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2$
The “spread” about the mean.
$Var(aX + b) = a^2Var(X)$
Note that $b$ does not shift the Variance (like with Expected Value!)
$Var(X \plusmn Y) = Var(X) + Var(Y)$
If $X$ and $Y$ are independent. Always a plus-sign because $-1^2$
$Var(X \plusmn Y) = Var(X) + Var(Y) \plusmn 2Cov(X, Y)$
If they’re dependent
$Var(\sum{X_i}) = \sum{Var(X_i)}$
$Var(\sum{a_iX_i + b_i}) = \sum{a_i^2 Var(X_i)}$

Why Always Add Variances?

We buy some cereal. The box says “16 ounces.” We know that’s not precisely the weight of the cereal in the box, just close. After all, one corn flake more or less would change the weight ever so slightly. Weights of such boxes of cereal vary somewhat, and our uncertainty about the exact weight is expressed by the variance (or standard deviation) of those weights.

Next we get out a bowl that holds 3 ounces of cereal and pour it full. Our pouring skill is not very precise, so the bowl now contains about 3 ounces with some variability (uncertainty).

How much cereal is left in the box? Well, we assume about 13 ounces. But notice that we’re less certain about this remaining weight than we were about the weight before we poured out the bowlful. The variability of the weight in the box has increased even though we subtracted cereal.

Moral: Every time something happens at random, whether it adds to the pile or subtracts from it, uncertainty (read “variance”) increases.

— Source

Covariance

$Cov(X, X) = Var(X)$
$Cov(X, Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]$
Just expand it out and you’ll see.
$Cov(X + a, Y + b) = Cov(X, Y)$
$Cov(aX, bY) = abCov(X, Y)$
$Cov(aX + bY, cP + dQ) = acCov(X, Y) + adCov(X, Q) + ...$

Correlation

Between two variables $X$ and $Y$ , it’s the Covariance of $(X,Y)$ normalized by their variance¹.

Corr(X, Y) = \rho = \frac{Cov(X, Y)}{\sqrt{Var(X)Var(Y)}}, -1 \leq \rho \leq 1

It measures the linearity of the relationship between Random Variables (or Vectors!)

\rho(X,Y) = 1 \implies Y = aX+B, a = \sigma_y/\sigma_x \rho(X,Y) = -1 \implies Y = aX+B, a = -\sigma_y/\sigma_x \rho(X,Y) = 0 \implies \text{No linearity}

warning

A correlation of zero does not necessarily mean that $X$ and $Y$ are independent!

Disjointedness and Independence

Disjoint Events

Putting all that stuff up there together. Disjointedness is also called “Mutual Exclusivity”: events have nothing in common in the Sample and Event Spaces. If one happens, the other cannot. In this case, they do not have a joint Probability Distribution at all. So,

E[XY] = 0 Cov(X,Y) = E[XY] - E[X]E[Y] = -E[X]E[Y]

Now you can look at the PDFs or PMFs of $X$ and $Y$ and expand out that equation². But what it’s saying with that negative sign is: if $X$ happens, $Y$ cannot.

Independent Events

For two independent events $X$ and $Y$ ,

E[XY] = E[X]E[Y] \implies Cov(X,Y) = E[XY] - E[X]E[Y] = E[X]E[Y] - E[X]E[Y] = 0

Summary

Property	Disjoint (AKA Mutually Exclusive)	Independent
Probability - Intersection	$P(X \cap Y) = 0$	$P(X \cap Y) = P(X)P(Y)$
Probability - Conditional	If both have positive probability: $P(X \mid Y) = 0$ , $P(Y \mid X) = 0$ (knowing one excludes the other)	$P(X \mid Y) = P(X)$ , $P(Y \mid X) = P(Y)$ (knowing one gives no info)
Expectation - Additivity	$\mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y]$ (always true)	Same (always true)
Expectation - Product	$\mathbb{E}[XY] = 0$	$\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]$
Variance of Sum	$\mathrm{Var}(X \plusmn Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) \plusmn 2Cov(X,Y)$	$\mathrm{Var}(X+Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)$
Covariance	$\mathrm{Cov}(X,Y) = -\mathbb{E}[X]\mathbb{E}[Y]$	$\mathrm{Cov}(X,Y) = 0$
Independence implied?	❌ Never (unless trivial zero-probability case)	✅ By definition
Disjointness implied?	✅ By definition	❌ Never (unless trivial zero-probability case)

Higher Order Moments

Order	Name
First	Mean	$\frac{\sum{x}}{n}$
Second	Variance	$\frac{\sum{x^2}}{n} \rarr \frac{\sum{(x - \mu)^2}}{n}$
Third	Skewness	$\frac{\sum{x^3}}{n} \rarr \frac{\sum{(x - \mu)^3}}{n} \rarr \frac{1}{n}\frac{\sum{(x - \mu)^3}}{\sigma^3}$
Fourth	Kurtosis	$\frac{\sum{x^4}}{n} \rarr \frac{\sum{(x - \mu)^4}}{n} \rarr \frac{1}{n}\frac{\sum{(x - \mu)^4}}{\sigma^4}$

This means we’re making the result unitless and clamping it to between $[0,1]$ ↩
One trick for demonstration purposes is to set the Proability Function to $1_A$ . This means the “Indicator Random Variable” which means $X = 1$ if $A$ happens else zero if $A$ doesn’t. So $E[X]$ just becomes the probability of $A$ . ↩

Expected Value​

Variance​

Why Always Add Variances?​

Covariance​

Correlation​

Disjointedness and Independence​

Disjoint Events​

Independent Events​

Summary​

Higher Order Moments​

Footnotes​