Estimation
An estimator is just another statistic (a function of your data, the sample data in this case) that you use to make a 'good guess' of an unknown parameter of the population data (the best possible one).
Note: stands for "Arguments of the Maxima". If you have a simple you're asking "What is the value of at which or is at its maximum?" This can be a bunch of values too.
Unbiasedness
An estimator is unbiased if its average value across multiple samplings from the population approaches the true value. That is, if is some function of the sample data and is the true parameter it's trying to estimate, is an unbiased estimator if and only if (as ... and this is an provable, exact equality!)
Now you don't really know the true value (or assume you don't) so you have to prove that this is the case with ✨Math Proofs✨.
Sample Mean and Variance as Estimators
The Expected Value of the mean of a Random Variable is an unbiased estimate of the true population mean. This is the same for the Variance but with a small degrees-of-freedom catch (note the ). You can prove both statements easily.
You can't just use the equations above to establish unbiasedness willy-nilly! For example, it's tempting to think that is an unbiased estimator of if you took square-roots. You'd be wrong.
Another example is the Exponential Distribution where we're trying to estimate . It's true that is an unbiased estimator of but the reverse is nonsensical. It doesn't make sense to say that is an unbiased estimator of lol
Techniques
There are three popular ones, with one being super popular 🤩
Mean-Squared Error / Least-Squares Error
The name has nothing to do with the statistic you're trying to estimate. If is the sample statistic and is the 'true' population statistic,
After some Math, this becomes
That latter terms is the bias! As a general rule lower MSE is better even if there is little bias. However, if and only if T is an unbiased estimator.
Maximum Likelihood Estimation (MLE) - The Superpopular One 🤩
MSE is not a point-estimator. This one is! This is what runs behind the scenes when you use glm
in R, for example, and you see a nice table with intercepts and coefficients. It's really not bad conceptually but can be a monster with actual implementation.
You assume that is IID in some PDF with parameter(s) . You're asking "What is the likelihood of seeing observations (note lowercase!) with various values of ?" You then pick the value of with the biggest number/likelihood. A computer does all this for you.
Since is IID,
Then your MLE is
One trick to tame Math Beasts is to use the Logarithm function since it increases monotonically and has some nice properties with allowing us to add stuff and its way of banishing exponents like . So the equation above becomes
Nice, eh? Try and think of how useful this is with, for example, the Gaussian or Exponential distributions.
Maximum A Posteriori (MAP)
This is very related to MLE and not too bad to get started with. Instead of just using you're saying "Instead of assuming that is fixed, I'm going to assume that it's a Random Variable with it's own distribution". That's really about it before all the (Bayesian) Math.
So let's redo some MLE but with having its own prior distribution. It's flipped!