Skip to main content

Sample Spaces, Events, Probability, & Random Variables

The World is a Random Place and Uncertainty Prevails. We would like to study this uncertainty and use our learnings to maximize Shareholder Value™ which is, or ought to be, the Primary Goal of our short lives.

Sample Spaces

In the World, you perform an Experiment E\mathcal{E} which leads to Outcomes ξ\xi. The space of all possible outcomes of your experiment1 is its Sample Space Ω\Omega.

Events and Event Spaces

An Event AA is just a set of outcomes in the Sample Space. That's really it.

But an Event Space F\mathcal{F} is a bit subtle. It is the set of all sets of outcomes in the sample space2. That is, it's the set of all possible Events.

Formally, to say that F\mathcal{F} is an event space,

  1. F\mathcal{F} must contain ϕ\phi and Ω\Omega (nothing and the entire kit-n-kaboodle)
  2. If some event AFA \in \mathcal{F} then AcFA^c \in \mathcal{F} ("closed under complement")
  3. If events A,BFA, B \in \mathcal{F} then ABFA \cup B \in \mathcal{F} ("closed under union")
  4. If 2 and 3 are true, you can use both to conclude that things are also "closed under intersection": AB=(AcBc)cA \cap B = (A^c \cup B^c)^c

So with this definition, and within a legit event space, an event can be no outcome3, a single outcome, or a collection of outcomes (including all outcomes) you're interested in. The closure rules mean that you pick an event AA, it 'generates' its own complement AcA^c in the event space4. This is nice because you can now use all manner of logical tricks to compute probabilities!

Sizes of Event Spaces

If you define an Event Space as the set of all subsets of a given set, you're just talking about the Powerset of the Sample Space in the discrete case. That is,

F=2n|\mathcal{F}| = 2^n

Say you toss a coin 100 times. That'll give you 100 outcomes in the Sample Space (i.e. Ω=100|\Omega| = 100). But we're talking about the Event Space, which is the set of all sets of outcomes in the sample space. That thing's size is 21002^{100}, an unimaginably large number.

When it comes to continuous outcomes, things get pretty complicated in that you're looking at both Sample and Event Spaces that are uncountably infinite in size ("cardinality"). There are a few things Math people do about this that involves Measure Theory to tame these Beasts of Infinity. That's all I gotta say about that for now.

Probability and Probability Measures

Easy-peasy with the definitions of Events and Event Spaces. Let's go in reverse tho. A Probability Measure PP is a kind of 'rulebook' that looks at an Event Space and assigns numbers ("sizes") to all Events within it4. That is, PP is a function that maps an event AA to [0,1]R[0, 1] \in \Reals. This number corresponding to a single event of interest, P(A)P(A) is the Probability of the event.

The Probability Measure must satisfy certain rules which we assume to be true ("axioms"):

  1. If event AFA \in \mathcal{F} then 0P(A)10 \leq P(A) \leq 1
    Probability of all events are non-negative real numbers

  2. P(Ω)=1P(\Omega) = 1 and P(ϕ)=0P(\phi) = 0
    "Something will happen": probability that at least one of the elementary events (those with just one outcome) will occur is 1.

  3. If events AA and BB are disjoint5 (AB=ϕA \cap B = \phi) then P(AB)=P(A)+P(B)P(A\cup B) = P(A) + P(B)

So what determines the rules of this 'rulebook'?

Random Variables

For non-Mathematicians, this is 'simply' a function that takes an outcome ξ\xi and maps it to a number X(ξ)X(\xi) that is in the Reals (R\Reals).

Let's say you're measuring the heights of 1,237 people. The height is the Random Variable. It is denoted XX, which is a function that will map the height of Alice (Outcome ξ12\xi_{12} of 1,237 outcomes) to 143.12cm and the height of Bob (Outcome ξ56\xi_{56}) height to 112.01cm. That is,

X(ξ1)=143.12X(ξ2)=112.01X(\xi_1) = 143.12 X(\xi_2) = 112.01

And so on. Whence the 'random' part of a Random Variable? Because you cannot predict what the mapping to R\Reals will be before you conduct the experiment ("The World in Random and Uncertainty Prevails"). And if you collect heights again with a different sample, XX will map to different numbers.

Random variables can be discrete/categorical or continuous.

Discrete - The Probability Mass Function (PMF)

This is denoted with a lowercase fXf_X and is defined as the probability that X=xX = x

fX(x)=P[X=x]f_X(x) = P[X = x]

Continuous - The Probability Density Function (PDF)

This is also denoted with a lowercase fXf_X and is defined as the probability that XxX \leq x

fX(x)=P[Xx]f_X(x) = P[X \leq x]

Summing Things - The Cumulative Distribution Function (CDF)

This is denoted FXF_X

TODO: Finish this.

Media

"The Big Picture of Statistics" by Very Normal (a Columbia Biostats grad!)

Footnotes

  1. Even those you haven't seen!

  2. A lovely combinatorial explosion!

  3. Being interested in no outcomes might be weird but consider being interested in "rolling a 16" when you have a six-sided die. {16}⊈{1,2,3,4,5,6}\{16\} \not\subseteq \{1,2,3,4,5,6\} where {1,2,3,4,5,6}\{1,2,3,4,5,6\} is the sample space.

  4. This some σ\sigma-algebra and Measure Theory stuff. Take it to be true. 2

  5. We're talking mutually exclusive events, not independent ones. That's totally different. E.g. rolling 3 and 5 on a single die cannot happen at once (Mutually Exclusive, P(AB)=0P(A \cap B) = 0) but can on two dice (Independent, P(AB)=P(A)P(B)P(A \cap B) = P(A) \cdot P(B)).