6.S897/HST.956: Machine Learning for Healthcare

For current class information, see here
Instructors: David Sontag, Peter Szolovits
Teaching Assistants: Willie Boag, Irene Chen (Office Hours: Monday 1pm, 32-G 9th floor lounge)
Graduate level; Units 3-0-9 (counts as an AAGS subject)
Time: Tuesdays & Thursdays, 2:30-4pm
Location: 4-270
Prerequisite: 6.036/6.862 or 6.867 or 9.520/6.860 or 6.806/6.864 or 6.438 or 6.034
Recitations (optional): Fridays at 2pm (4-153)
Contact: Piazza
Stellar page: https://stellar.mit.edu/S/course/HST/sp19/HST.956/

Course description

Introduces students to machine learning in healthcare, including the nature of clinical data and the use of machine learning for risk stratification, disease progression modeling, precision medicine, diagnosis, subtype discovery, and improving clinical workflows. Topics include causality, interpretability, algorithmic fairness, time-series analysis, graphical models, deep learning and transfer learning. Guest lectures by clinicians from the Boston area and course projects with real clinical data emphasize subtleties of working with clinical data and translating machine learning into clinical practice.

Note that because of high demand, we do not have space for listeners.

Schedule

Schedule is subject to change.

Class	Date	Lecture Materials	Assignments
1	Tues Feb 05	Introduction: What makes healthcare unique? [Slides] Lecture 1 [optional] Using Analytics To Identify And Manage High-Risk And High-Cost Patients	Prerequisite quiz due Pset0 out
2	Thurs Feb 07	Overview of clinical care [Slides] Lecture 2
3	Tues Feb 12	Deep dive into clinical data [Slides] Lecture 3 [required] Biases in electronic health record data due to processes within the healthcare system	Reflection questions Pset0 due Pset1 out
4	Thurs Feb 14	Risk stratification using EHRs and insurance claims (Discussant: Leonard D'Avolio) [Slides] Lecture 4 , Recitation 1, Recitation 1 Notebook [required] Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors [required] A Predictive Instrument to Improve Coronary-Care-Unit Admission Practices in Acute Ischemic Heart Disease	Reflection questions
Tues Feb 19 - President's Day, Monday schedule
5	Thurs Feb 21	Survival modeling [Slides] Lecture 5, Recitation 2, Recitation 2 Notebook [required] Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission [required] An Improved Multi-Output Gaussian Process RNN with Real-Time Validation for Early Sepsis Detection [optional] A targeted real-time early warning score (TREWScore) for septic shock [optional] Chapter 7: Survival Models	Reflection questions Pset1 due
6	Tues Feb 26	Physiological time-series [Slides] Lecture 6 [required] Factorial Switching Linear Dynamical Systems Applied to Physiological Condition Monitoring [required] Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network	Reflection questions Pset2 out (on stellar)
7	Thurs Feb 28	Clinical text part 1 (Discussant: Katherine Liao) [Slides] Lecture 7, Recitation 3 [required] Challenges in Clinical Natural Language Processing for Automated Disorder Normalization [optional] Electronic medical record phenotyping using the anchor and learn framework [optional] Aspiring to Unintended Consequences of Natural Language Processing	Reflection questions
8	Tues Mar 05	Clinical text part 2 [Slides] Lecture 8 [optional] Attention Is All You Need [optional] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	Pset2 due Pset3 out
9	Thurs Mar 07	Translating technology into the clinic (Discussant: Adam Wright) [Slides] Lecture 9, Recitation 4 [optional] Insights From Advanced Analytics At The Veterans Health Administration [optional] Clinical Decision Support for Early Recognition of Sepsis
10	Tues Mar 12	Machine learning for cardiology (Guest lecture: Rahul Deo) [Slides] Lecture 10 [required] Chapter 13 on “Cardiovascular Diseases” from the book "Artificial Intelligence in Medical Imaging" (students can access the e-book for free from MIT) [optional] Fully Automated Echocardiogram Interpretation in Clinical Practice [optional] FastVentricle: Cardiac Segmentation with ENet	Reflection questions Pset3 due Pset4 out
11	Thurs Mar 14	Machine learning for differential diagnosis [Slides] Lecture 11 [required] Learning a Health Knowledge Graph from Electronic Medical Records [optional] Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base. I. The probabilistic model and inference algorithms [optional] Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base. II. Evaluation of diagnostic performance [optional] Heuristic Methods for Imposing Structure on Ill-Structured Problems: The Structuring of Medical Diagnostics	Reflection questions
12	Tues Mar 19	Machine learning for pathology (Guest lecture: Andy Beck) [Slides] Lecture 12 [required] Deep Learning for Identifying Metastatic Breast Cancer [optional] Exploring the ChestXray14 dataset: problems	Reflection questions Pset4 due
13	Thurs Mar 21	Machine learning for mammography (Guest lecture: Connie Lehman, Adam Yala) [Slides] Lecture 13, part 2 [required] Chapter 14 on “Deep Learning in Breast Cancer Screening” from the book "Artificial Intelligence in Medical Imaging" (students can access the e-book for free from MIT) [optional] Mammographic Breast Density Assessment Using Deep Learning: Clinical Implementation	Reflection questions Project proposals due
Tues Mar 26 & Thurs Mar 28 - Spring vacation
14	Tues Apr 02	Causal inference part 1 [Slides] Lecture 14 [required] Chapter 1 of Causal Inference [optional] Postsurgical prescriptions for opioid naive patients and association with overdose and misuse: retrospective cohort study [optional] Personalized Diabetes Management Using Electronic Medical Records [optional] Counterfactuals	Reflection questions Midsemester feedback Pset5 out
15	Thurs Apr 04	Causal inference part 2 [Slides] Lecture 15, Recitation 6 [optional] From Association to Causation in Observational Studies: The Role of Tests of Strongly Ignorable Treatment Assignment [optional] Confounding-Robust Policy Improvement [optional] Causal Effect Inference with Deep Latent-Variable Models	Midsemester feedback
16	Tues Apr 09	Reinforcement learning part 1 (Guest lecture: Fredrik Johansson) [Slides] Lecture 16 [required] A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units [optional] Statistical Methods for Dynamic Treatment Regimes, Section 2.1,2.2, and Chapter 3 (MIT link) [optional] Guidelines for reinforcement learning in healthcare	Reading questions
17	Thurs Apr 11	Reinforcement learning part 2 (Guest lecture: Barbra Dickerman) [Slides] Lecture 17, Recitation 7 [required] The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care [optional] Does the “Artificial Intelligence Clinician” learn optimal treatment strategies for sepsis in intensive care? [optional] Guideline-Based Physical Activity and Survival Among US Men With Nonmetastatic Prostate Cancer	Reading questions Pset5 due
Tues Apr 16 - Patriots Day holiday
18	Thurs Apr 18	Disease progression & subtyping part 1 [Slides] Lecture 18 [required] Integrative Analysis using Coupled Latent Variable Models for Individualizing Prognoses	Pset6 out
19	Tues Apr 23	Disease progression & subtyping part 2 [Slides] Lecture 19 [required] Uncovering the heterogeneity and temporal complexity of neurodegenerative diseases with Subtype and Stage Inference [optional] Unsupervised Learning of Disease Progression Models [optional] Inferring Multidimensional Rates of Aging from Cross-Sectional Data [optional] A comparison of single-cell trajectory inference methods [optional] Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference
20	Thurs Apr 25	Precision medicine [Slides] Lecture 20 [required] Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis [optional] PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations	Pset6 due
21	Tues Apr 30	Automating clinical workflows [Slides] Lecture 21 [optional] Paving the COWpath: Learning and visualizing clinical pathways from electronic health record data [optional] The Checklist
22	Thurs May 02	Regulation of ML/AI in the US (Guest lecture: Andy Coravos, Mark Shervey) [Slides] Lecture 22, part 2 [required] US FDA Artificial Intelligence and Machine Learning Discussion Paper [optional] The rise of digital medicine: software and algorithms that measure, diagnose and treat (WIRED) [optional] We should treat algorithms like prescription drugs (Quartz) [optional] Want to create meaningful change in the US healthcare system? Serve a “tour of duty” in the government (Rock Health) [optional] If you want to make government programs work better, submit a public comment (Medium)	Reading questions
23	Tues May 07	Fairness [Slides] Lecture 23 [optional] Demographic classification criteria [optional] The Frontiers of Fairness in Machine Learning
24	Thurs May 09	Robustness to dataset shift [Slides] Lecture 24 [optional] Implications of non-stationarity on predictive modeling using EHRs [optional] Datasheets for Datasets [optional] Domain-Adversarial Training of Neural Networks [optional] Embeddings of Medical Concepts [optional] Enhancing Clinical Concept Extraction with Contextual Embedding
	Tues May 14	No class	Project poster presentations (evening)
25	Thurs May 16	Interpretability [Slides] Lecture 25 [optional] "Why Should I Trust You?": Explaining the Predictions of Any Classifier [optional] Falling Rule Lists	Projects due

Prerequisite quiz

This quiz will not count toward your grade, but will be used by the course staff to check prerequisites (6.036/6.862 or 6.867 or 9.520/6.860 or 6.806/6.864 or 6.438 or 6.034) and to assess students' preparation for this class.

The prerequisite quiz is now closed, but you can view the questions here.

Grading

40% homework (7 problem sets)
40% course project
20% participation (including lecture scribing, MLHC community consulting, and reading responses)

Problem sets

We expect there will be seven problem sets this year.

Problem set 0 [Deadline: Mon Feb 11 at 11:59pm EST]

In order to access sensitive healthcare datasets, you will need to complete several preliminary tasks. Please see Stellar for full instructions and submission details.

Problem set 1 [Deadline: Thurs Feb 21 at 11:59pm EST]
Problem set 2 [Deadline: Tues March 5 at 11:59pm EST]

Please see Stellar for instructions to access the IBM data.

Problem set 3 [Deadline: Tues March 12 at 11:59pm EST]
Problem set 4 [Deadline: Tues March 19 at 11:59pm EST]
Problem set 5 [Deadline: Thurs April 11 at 11:59pm EST]
Problem set 6 [Deadline: Thurs April 25 at 11:59pm EST]

Lecture scribes

Each student is expected to either “scribe” for one lecture or "consult" for one MLHC community evening session (see below). A given lecture will have 1-2 scribes who are responsible for summarizing what was discussed in class. The first draft of the notes should be submitted to the TAs by 11:59pm of the day after class (i.e. 30 hours after lecture ends). We will send you suggestions to revise, and once the notes are finalized, we will then post it on the course website. The goal will be to get the notes out by one week after the corresponding class.

We expect writing up lecture notes to take no more than 3 hours. If there are two scribes for one lecture, the two scribes should collaborate and submit one writeup. The notes you write should cover all the material covered during the relevant lecture, plus real references to the papers containing the covered material. Your notes should be understandable to someone who has not been to the lecture. You should write in full sentences where appropriate; point form is often too terse to follow without a sound track (though occasionally it is appropriate). Use numbered sections, subsections, etc. to organize the material hierarchically and with meaningful titles. Try to preserve the motivation, difficulties, solution ideas, failed attempts, and partial results obtained along the way in the actual lecture.

Write your notes using LaTeX. Please use our template -- either through downloading the template or using Overleaf (Menu -> Copy project).

MLHC Community Consulting

Each student is expected to either “scribe” for one lecture (see above) or "consult" for one Machine Learning for Healthcare (MLHC) community evening session. Throughout the semester, we will organize four evening sessions to engage with the larger MLHC community. Clinicians and other Boston area people interested in machine learning for healthcare will come to talk through their problems and ideas.

MLHC Community Consulting for this semester will occur:

Tues Mar 5: 5-7pm in 32-G449
Wed Mar 20: 5-7pm in 32-G882
Thurs Apr 11: 5:30-7:30pm in 32-D463
Mon Apr 29: 5-7pm in 32-D463

Clinicians are welcome to sign-up here for more information, or see our poster.

Students who sign up for community consulting will be expected to attend the entire session and submit a write-up of their experiences shortly after the session. We expected one write-up per clinician, so students should coordinate if they talked to the same clinician. Write-ups are due one week after the consulting session.

Projects

Projects will include a proposal, poster presentation, and final report. We will add more information here shortly.

Due dates:

Project proposals (one per group): Thurs Mar 21 at 11:59pm.
Project poster presentations: Tues May 14, 5-7pm in 34-401.
Project report (one per group): Thurs May 16 at 11:59pm.

Collaboration: Students should be groups of 3 registered students. Doing something related to your research is fine, but your class project should be distinct and you should be able to isolate your contributions to the project from those of any collaborators outside of the class.

Relationship to other classes: You must ask instructors for permission before submitting project proposal if you wish to use the same project for our class and another class (it should also be stated clearly in the proposal itself). If it is one project for two classes you must:

Produce a project that is twice as large in depth and content as would have been required for either class individually
Obtain permission from the instructor of the other class

Moreover, all students in the project should be enrolled in both classes.

Proposal: At most 3 pages, one per group. Submit through Stellar. Clearly state the following:

Problem you wish to tackle
Description of data you plan to use
Proposed approach and methods
Evaluation plan
Timeline
What each student in the group will do.

We understand that much of this would be preliminary at this stage, but these details are important for us to ensure that you are on the right track.

Final project poster and writeup: Detailed poster and write-up guidelines can be found here.

Collaboration Policy

Students must write up their problem sets individually. Students should not share their code or solutions (i.e., the write up) with anyone inside or outside of the class, nor should it be posted publicly to GitHub or any other website. You are asked on problem sets to identify your collaborators. If you did not discuss the problem set with anyone, you should write "Collaborators: none." If in writing up your solution you make use of any external reference (e.g. a paper, Wikipedia, a website), both acknowledge your source and write up the solution in your own words. It is a violation of this policy to submit a problem solution that you cannot orally explain to a member of the course staff.

Plagiarism and other dishonest behavior cannot be tolerated in any academic environment that prides itself on individual accomplishment. If you have any questions about the collaboration policy, or if you feel that you may have violated the policy, please talk to one of the course staff.

Problem Set Late Policy

(starting for pset2 onwards)

[2 "slack" days] We understand that sometimes things outside one's control prevent submitting by the deadline. As such, each student is given 2 "slack" days that they can use throughout the semester (e.g. you could submit two psets one day late each or you could submit one pset two days late) without a late penalty. the days do not subdivide into sub-day units: 2 hours late would spend one of the slack days without 22 hours of "rollover". In your pdf writeup, specify how many slack days you are using (they cannot be used retroactively).
[10% off per unexcused late day.] If you submit a pset 3 days late and use 1 slack day, then this is 2 unexcused late days, which translates to 20% off your homework.
[write on homework] In order to use a slack day, students must include it in writing on their submission pdf. Otherwise, TAs will assume no slack days used.

Scenarios:

Sam uses 2 slack days on HW3. This is the first time Sam has used any slack days. Sam now has 0 remaining slack days and receives her homework score with no penalty.
Jamie uses 1 slack day on HW3 but submits 52 hours after the deadline. Therefore Jamie is 3 days late (rounded up) and receives 20% off the graded homework. This is the first time Jamie has used any slack days, so Jamie now has 1 slack day remaining.