Skip to main content

Estimation

An estimator is just another statistic (a function of your data, the sample data in this case) that you use to make a 'good guess' of an unknown parameter of the population data (the best possible one).

Note: argmaxargmax stands for "Arguments of the Maxima". If you have a simple y=f(x)y = f(x) you're asking "What is the value of xx at which yy or f(x)f(x) is at its maximum?" This can be a bunch of values too.

Unbiasedness

An estimator is unbiased if its average value across multiple samplings from the population approaches the true value. That is, if T[X]T[X] is some function of the sample data and θ\theta is the true parameter it's trying to estimate, T[X]T[X] is an unbiased estimator if and only if E[T(X)]=θE[T(X)] = \theta (as nn \rarr \infty... and this is an provable, exact equality!)

Now you don't really know the true value (or assume you don't) so you have to prove that this is the case with ✨Math Proofs✨.

Sample Mean and Variance as Estimators

The Expected Value of the mean of a Random Variable is an unbiased estimate of the true population mean. This is the same for the Variance but with a small degrees-of-freedom catch (note the (n1)(n -1)). You can prove both statements easily.

E[Xˉ]=E[1ni=1nXi]=μE[S2]=E[1n1i=1n(XiXˉ)2]=σ2E[\bar{X}] = E[\frac{1}{n}\sum_{i=1}^n{X_i}] = \mu E[S^2] = E[\frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2] = \sigma^2
Don't be hasty

You can't just use the equations above to establish unbiasedness willy-nilly! For example, it's tempting to think that SS is an unbiased estimator of σ\sigma if you took square-roots. You'd be wrong.

Another example is the Exponential Distribution Exp(λ)Exp(\lambda) where we're trying to estimate λ\lambda. It's true that Xˉ\bar{X} is an unbiased estimator of 1λ\frac{1}{\lambda} but the reverse is nonsensical. It doesn't make sense to say that λ\lambda is an unbiased estimator of 1Xˉ\frac{1}{\bar{X}} lol

Techniques

There are three popular ones, with one being super popular 🤩

Mean-Squared Error / Least-Squares Error

The name has nothing to do with the statistic you're trying to estimate. If TT is the sample statistic and θ\theta is the 'true' population statistic,

MSEE[(Tθ)2]MSE \colonequals E[(T - \theta)^2]

After some Math, this becomes

MSE(T)=Var(T)+(E[T]θ)2MSE(T) = Var(T) + (E[T] - \theta)^2

That latter terms is the bias! As a general rule lower MSE is better even if there is little bias. However, MSE=Var(T)MSE = Var(T) if and only if T is an unbiased estimator.

Maximum Likelihood Estimation (MLE) - The Superpopular One 🤩

MSE is not a point-estimator. This one is! This is what runs behind the scenes when you use glm in R, for example, and you see a nice table with intercepts and coefficients. It's really not bad conceptually but can be a monster with actual implementation.

You assume that XX is IID in some PDF with parameter(s) Φ\Phi. You're asking "What is the likelihood of seeing observations xx (note lowercase!) with various values of Φ\Phi?" You then pick the value of Φ\Phi with the biggest number/likelihood. A computer does all this for you.

Since XX is IID,

Likelihood  p(xΦ)=k=1np(xkΦ)\text{Likelihood} \space\space p(x|\Phi) = \prod_{k=1}^{n}p(x_k|\Phi)

Then your MLE is

ΦMLE=arg maxΦ[pn(xΦ)]\Phi_{MLE} = \argmax_{\Phi}[p_n(x|\Phi)]

One trick to tame Math Beasts is to use the Logarithm function since it increases monotonically and has some nice properties with allowing us to add stuff and its way of banishing exponents like ee. So the equation above becomes

log(Φ)=log(p(xΦ)=k=1nlog[p(xkΦ))]log(\Phi) = log(p(x|\Phi) = \sum_{k=1}^{n}log[p(x_k|\Phi))]

Nice, eh? Try and think of how useful this is with, for example, the Gaussian or Exponential distributions.

Maximum A Posteriori (MAP)

This is very related to MLE and not too bad to get started with. Instead of just using Φ\Phi you're saying "Instead of assuming that Φ\Phi is fixed, I'm going to assume that it's a Random Variable with it's own distribution". That's really about it before all the (Bayesian) Math.

So let's redo some MLE but with Φ\Phi having its own prior distribution. It's flipped!