Sample Spaces, Events, Probability, & Random Variables

Axiom: The World is a Random Place and Uncertainty prevails. We would like to study this uncertainty and use our learnings to sell our Startup to the highest bidder and/or maximize Shareholder Value™, things that ought to be the Primary Goal of our wild, short, and precious lives.

Sample Spaces

In the World, you perform an Experiment $\mathcal{E}$ which leads to Outcomes. The space of all possible outcomes of your experiment, even those you haven't observed (this is of supreme importance!) is its Sample Space $\Omega$ .

Events and Event Spaces

An Event $A$ is just a set of outcomes in the Sample Space. That's really it.

Hark!

These may be words but the concepts are very important. Outcomes and Events are often 'collapsed' and this causes all sorts of thought problems.

Consider this: "I am going to draw a card from a deck and I see that it is a Queen of Hearts."

What is the Experiment? What is the Event?

This is the experiment. There are 52 outcomes (leave the Jokers out) which dot, comprise, make up the sample space. Out of these the

"I draw a Queen of Diamonds" is an Event. An apple (outcome) and a basket containing just one apple (event) are not the same thing: Change the Event to "I draw a Red card" and the basket now has 26 apples. Some single random-ass red card and a basket containing all possible red cards are not the same thing.

The Event Space $\mathcal{F}$ is a bit bigger. It is the set of all sets of outcomes in the sample space¹. That is, it's the set of all possible Events.

Formally, to say that $\mathcal{F}$ is an event space,

$\mathcal{F}$ must contain $\phi$ and $\Omega$ (nothing and the entire kit-n-kaboodle)
If some event $A \in \mathcal{F}$ then $A^c \in \mathcal{F}$ ("closed under complement")
If events $A, B \in \mathcal{F}$ then $A \cup B \in \mathcal{F}$ ("closed under union")
If 2 and 3 are true, you can use both to conclude that things are also "closed under intersection": $A \cap B = (A^c \cup B^c)^c$ . You can draw a small little figure for this and verify it.

So with this definition, and within a legit event space, an event can be no outcome², a single outcome, or a collection of outcomes (including all outcomes) you're interested in. The closure rules mean that you pick an event $A$ , it 'generates' its own complement $A^c$ in the event space³. This is nice because you can now use all manner of logical tricks to compute probabilities!

The importance of that last part cannot be overestimated when it comes to probabilities and 'marginalizing' over them to do things. TODO: Explain why w/Law of Total Probability.

Sizes of Event Spaces

If you define an Event Space as the set of all subsets of a given set, you're just talking about the Powerset of the Sample Space in the discrete case. That is,

|\mathcal{F}| = 2^n

Say you toss a coin 100 times. That'll give you 100 outcomes in the Sample Space (i.e. $|\Omega| = 100$ ). But we're talking about the Event Space, which is the set of all sets of outcomes in the sample space. That thing's size is $2^{100}$ , an unimaginably large number.

When it comes to continuous outcomes, things get pretty complicated in that you're looking at both Sample and Event Spaces that are uncountably infinite in size ("cardinality"). There are a few things Math people do about this that involves Measure Theory to tame these Beasts of Infinity. That's all I gotta say about that for now.

Probability and Probability Measures

Easy-peasy with the definitions of Events and Event Spaces. Let's go in reverse tho. A Probability Measure $P$ is a kind of 'rulebook' that looks at an Event Space and assigns numbers ("sizes") to all Events within it³. That is, $P$ is a function that maps an event $A$ to $[0, 1] \in \Reals$ . This number corresponding to a single event of interest, $P(A)$ is the Probability of the event.

The Probability Measure must satisfy certain rules which we assume to be true ("axioms"):

If event $A \in \mathcal{F}$ then $0 \leq P(A) \leq 1$
Probability of all events are non-negative real numbers
$P(\Omega) = 1$ and $P(\phi) = 0$
"Something will happen": probability that at least one of the elementary events (those with just one outcome) will occur is 1.
If events $A$ and $B$ are disjoint⁴ ( $A \cap B = \phi$ ) then $P(A\cup B) = P(A) + P(B)$

So what determines the rules of this 'rulebook'?

Random Variables

For non-Mathematicians, this is 'simply' a function that takes an outcome $\xi$ and maps it to a number $X(\xi)$ that is in the Reals ( $\Reals$ ).

Let's say you're measuring the heights of 1,237 people. The height is the Random Variable. It is denoted $X$ , which is a function that will map the height of Alice (Outcome $\xi_{12}$ of 1,237 outcomes) to 143.12cm and the height of Bob (Outcome $\xi_{56}$ ) height to 112.01cm. That is,

X(\xi_1) = 143.12 X(\xi_2) = 112.01

And so on. Whence the 'random' part of a Random Variable? Because you cannot predict what the mapping to $\Reals$ will be before you conduct the experiment ("The World in Random and Uncertainty Prevails"). And if you collect heights again with a different sample, $X$ will map to different numbers.

Random variables can be discrete/categorical or continuous. Thing to understand here is that Mass is not the same as Density, just like you'd expect in real life. You'll hear professors talk about "distributing a probability mass across" something.

Discrete - The Probability Mass Function (PMF)

This is denoted with a lowercase $f_X$ and is defined as the probability that $X = x$

f_X(x) = P[X = x]

Continuous - The Probability Density Function (PDF)

This is also denoted with a lowercase $f_X$ and is defined as the probability that $X \leq x$

f_X(x) = P[X \leq x]

Summing Things - The Cumulative Distribution Function (CDF)

This is denoted $F_X$

TODO: Finish this.

Media

"The Big Picture of Statistics" by Very Normal (a Columbia Biostats grad!)

A lovely combinatorial explosion! ↩
Being interested in no outcomes might be weird but consider being interested in "rolling a 16" when you have a six-sided die. $\{16\} \not\subseteq \{1,2,3,4,5,6\}$ where $\{1,2,3,4,5,6\}$ is the sample space. ↩
This some $\sigma$ -algebra and Measure Theory stuff. Take it to be true. ↩ ↩²
We're talking mutually exclusive events, not independent ones. That's totally different. E.g. rolling 3 and 5 on a single die cannot happen at once (Mutually Exclusive, $P(A \cap B) = 0$ ) but can on two dice (Independent, $P(A \cap B) = P(A) \cdot P(B)$ ). ↩

Sample Spaces​

Events and Event Spaces​

Sizes of Event Spaces​

Probability and Probability Measures​

Random Variables​

Discrete - The Probability Mass Function (PMF)​

Continuous - The Probability Density Function (PDF)​

Summing Things - The Cumulative Distribution Function (CDF)​

Media​

Footnotes​