Phenotyping by George Hripcsak
identification or estimation of clinically useful properties from raw clinical data
Eg. Based on EHR data, does patiet have diabetes
PERRLA
Pupils equal, round, reactive to light and accomodation
Missing
Data are mostly missing
sampled when sick
pertinent negatives by attending vs CC3
Noisy
As low as 50% accuracy
Truth → concept → record → concept
|
V
model
Truth: health status of the patient
Concept: clinician or patiets conception
Record: EHR/PHR
Concept: 2nd clinicians conception of the patient
Model: computable representation
Complex
what is the right time?
when specimen drawn
when speciment received
when test performed
when result updated
when result received by the patient
when patient told clinician
Complex
narrative text holds much of teh useful info
slight increase of pumonary vascular congestion with new left pleural effusion, question mild congestive changes
s/ LURT 1998 c/b 1A rejection 7/07 back on HD
Natural Language Processing
“slight increase of pulmonary vascular congestion with new left plueral effusion, question mild congestive changes”
-> pulmonary vascular congestion change: increase degree: low
-> pleural effusion: ?
-> ?
Health care process bias
environment → patient state → care team → electronic health record
^ |
| V
— — — therapy
Good News
doctors successfully infer patient state from records
we need to mimic the doctors reasoning
deconvolve the truth
EHR-derived phenotype
clinically relevant feature derived from EHR
patient has a diagnosis of type II diagbetes
recent rash and fever
drug-induced liver injury
then use phenotype in correlation studies
which treatments associated with best outomces
Raw EHR data - (query) → phenotype - (data mining) → correlations
“Physics” of the medical record
- Study EHR as if it were a natural object
- use EHR to learn about EHR
- not studying patient, but recording of patient
- aggregate across units and models
- ? [why this man so fast]
Correlate lab tests and concepts
[this man has places to fucking be]
Timing of cause in disease vs treatment
[this dude is the symbolic AI reading the second group of reading on lagged linear model]
Interpreting time
truth = when they came in
narrative = when the doctor said what it happened
Control for health care process effects
health care processes inject signals into physiological effects
Granger to decipher associations
Phenotyping
manual review of every case
why is manual review of a chart trusted
need better metric
knowledge engineering
manual authoring of rule
maybe with tools, increasingly automated
manual review for test set to evaluate phenotype
slow and inaccurate
machine learning
supervised learning
manual review for training set and test set
problem: what happens at the edges with so many degrees of freedom
phenotype discovery
insupervised learning, clustering
different goal: summarize and understand the data set
evaluate
“Next generation” phenotyping
- Manual chart review
- Manually written rules
- Automated, semi-autmated, or assisted rule writing
- Improved performance by better understanding the EHR
- Use of lanauge models to incorporate human knowledge
- Wvlauate
Overby DILI 2013 hard problem, slow solution
drug-induced liver injury
defined bt negation
liver disease not due to anyhting else
6 months to generate the definition
TRALI
transfusionrelated acute lung injury
tough FDA contract example
extreme example of negative example
?
Transportability
want phenotypes to work correctly everywhere
Metrics
sensitivity = TP/(TP+FN) → recall
specificity = TN/(TN+FP)
positive predictive value = TP/(TP+FP)
AUROC (area under the receiver operating characteristic curve area)
area under plot of sensitivity and specificity
probability of picking the right one given a choice of two, with one right and one wrong
0.5 (chance) to 1.00 (perfect)
Concept set recommender (PHOEBE)
ATLAS: Cohort building
optimized for obsevational research
time series: who and when (vs classification)
assume a complex definition -- linearized AND-OR group
KEEPER: Chart review alternative
Swerdel 2019 PheValuator
estimate sensitivity and specificity without chart review
choose very sensitive and very specific cohorts and learn ML model
judge new algorithm’s performance using the ML predictions in place of knowledge of true and false
Cai PheNorm 2017
normalize and denoise features
Cai MAP 2019
Cai SureLDA 2020
Sontag Anchors 2014
domain expert picks imperfect relevant variables
learn other variables from data set
silver standard
not so accurate like gold standard
Shah silver standard 2016
Bhave LSTM to hand time 2023
Albers PopKLD 2018
how to handle continous data
Pivovarov Uphenome 2015
unsupervised
improved on LDA for heterogeneous cases
reduces mixture of phentypes
Shickel ?
CEHR-GPT
Phenotyping
leave the phenotype implicit in the deep leanring model?
hard to predict its errors
poor performance on hard tasks and fabrication, but better than poor human performance
its all in evaluating the phenotype
our methods exceed our ability to evaluate