Blog | Nikhilʼs Notes from Columbia 🦁

First Post!

May 5, 2025

Some notes from a call with the DBMI Graduate Program Manager:

The department is under GSAS but located at Vagelos ("VP&S")
Classes begin September 2nd. In NYC, classes start the day after Labor Day.
Orientation starts on August 25th. The Dean will mail out details. Quite a few things are mandatory. Plan accordingly.
There's a department retreat on Friday September 5th from 8-5, followed by a social. You can meet with potential research advisers on this day!
Summer classes are short and expensive; not recommended.
The first semester has to be 1 Residence Unit (this is a GSAS requirement).
Wait for Dean Mao's email for further instructions.
You will be contacted in July for class registrations.
They're not offering G4000 anymore. You'll take a placement test to see if you need more stats/programming.
There's a capstone project option now (not on website).
The essay can be 1-3 credits.
First come first serve with classes in other departments. Very high likelihood you'll get the BINF classes tho.
MPhils are just for PhD students :/
You don't/can't mix and match tracks.
Finding paid RA and TA-ships is tough, especially given recent events.

A Proof of the Law of the Unconscious Statistician

April 7, 2024

It's simple (and highly useful) enough in practice. If $g(X)$ is some function of Random Variable $X$ ,

E[g(X)] = \text{SUM}[g(x).f_X(x)]

I do not completely understand the proof but here it is for reference.

A Note on Precision and Recall

April 6, 2024

Our model is saying “I can predict sick people 96% of the time”. However, it is doing the opposite. It is predicting the people who will not get sick with 96% accuracy while the sick are spreading the virus!

Do you think this is a correct metric for our model given the seriousness of the issue? Shouldn’t we be measuring how many positive cases we can predict correctly to arrest the spread of the contagious virus? Or maybe, out of the correctly predicted cases, how many are positive cases to check the reliability of our model?

This is where we come across the dual concept of Precision and Recall.

Precision tells us how many of the correctly predicted cases actually turned out to be positive.

Recall tells us how many of the actual positive cases we were able to predict correctly with our model.

Precision is a useful metric in cases where False Positive is a higher concern than False Negatives.

Recall is a useful metric in cases where False Negative trumps False Positive.

Recall is important in medical cases where it doesn’t matter whether we raise a false alarm but the actual positive cases should not go undetected!

On Fitting Parameters

April 5, 2024

“When Dyson met Fermi, he quickly put aside the graphs he was being shown indicating agreement between theory and experiment.

His verdict, as Dyson remembered, was “There are two ways of doing calculations in theoretical physics. One way, and this is the way I prefer, is to have a clear physical picture of the process you are calculating. The other way is to have a precise and self-consistent mathematical formalism. You have neither.”

When a stunned Dyson tried to counter by emphasizing the agreement between experiment and the calculations, Fermi asked him how many free parameters he had used to obtain the fit. Smiling after being told “Four,” Fermi remarked, “I remember my old friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” There was little to add.”

-- Segre G., Hoerlin B. The Pope of Physics: Enrico Fermi and the Birth of the Atomic Age

Simple R Code that Illustrates the Inverse Transform Theorem

April 4, 2024

by Michael Kuehn, grad student extraordinaire at Georgia Tech 🐝

set.seed(1)
n <- 1e6

# Exponential
X <- rexp(n, rate = 1)
hist(X)
FX <- 1- exp(-X)
hist(FX)

# Using built-in CDF
FX <- pexp(X, rate = 1)
hist(FX)

# Normal
X <- rnorm(n, mean = 0, sd = 1)
hist(X)
FX <- pnorm(X, mean = 0, sd = 1)
hist(FX)

# t
X <- rt(n, df = 10)
hist(X)
FX <- pt(X, df = 10)
hist(FX)

# Cauchy
X <- rcauchy(n, location = 0, scale = 1)
hist(X, freq = F, breaks = "FD", xlim=c(-15,15))
FX <- pcauchy(X, location = 0, scale = 1)
hist(FX)

# Weibull
X <- rweibull(n, shape = 2, scale = 2)
hist(X)
Fx <- pweibull(X, shape = 2, scale = 2)
hist(FX)

# Gamma
X <- rgamma(n, shape = 3)
hist(X)
FX <- pgamma(X, shape = 3)
hist(FX)