Machine Learning Seminar Series at MIT CSAIL
The ML Tea Talks are meant to be a weekly series of informal 30‑minute talks from members of the machine learning community around MIT. Everyone is welcome to attend to hear about some of the cool ML research being done around here.
We provide Zoom links for all talks through our mailing list. In case you require a link urgently or would like to join our mailing list, kindly reach out via email to one of the organizers listed below.
Please subscribe to the MIT ML mailing list to receive weekly emails containing updates and announcements about seminars.
For Spring'25, we have an exciting lineup of speakers who will be sharing their insights on a variety of topics in machine learning and AI. We look forward to seeing you there!
Venue: Hewlett Seminar Room, 32‑G882
Time: Every Monday at 4 PM (unless otherwise specified)
Theoretical Perspectives on Data Quality and Selection
Wednesday, February 19, 4 pm - 5 pm: Room G449 (Kiva Room)
abhishekshettymit@gmail.com
Though the fact that data quality directly affects the quality of our prediction has always been understood, the large scale data requirements of modern machine learning tasks has brought to fore the need to develop a richer vocabulary for understanding the quality of collected data towards predictions tasks of interest and the need to develop algorithms that most effectively use collected data. Though, this has been studied in various contexts such as distribution shift, multitask learning and sequential decision making, there remains a need to develop techniques to address problems faced in practice. Towards this aim of starting a dialogue between the practical and theoretical perspectives on these important problems. I will survey some recent techniques developed in TCS and statistics addressing data quality and selection.
ScoreMix: One-Step Generative Model Training via Score Estimation of Mixture Distributions
Monday, February 24, 4 pm-5 pm: Room G882 (Hewlett Room)
tejasj@mit.edu
Abstract coming soon.
Learning Generative Models from Corrupted Data
Monday, March 3, 4 pm - 5 pm: Room G882 (Hewlett Room)
gdaras@mit.edu
In scientific applications, generative models are used to regularize solutions to inverse problems. The quality of the models depends on the quality of the data on which they are trained. While natural images are abundant, in scientific applications access to high-quality data is scarce, expensive, or even impossible. For example, in MRI the quality of the scan is proportional to the time spent in the scanner and in black-hole imaging, we can only access lossy measurements. Contrary to high-quality data, noisy samples are generally more accessible. If we had a method to transform noisy points into clean ones, e.g., by sampling from the posterior, we could address these challenges. A standard approach would be to use a pre-trained generative model as a prior. But how can we train these priors in the first place without having access to data? We show that one can escape this chicken-egg problem using diffusion-based algorithms that account for the corruption at training time. We present the first algorithm that provably recovers the distribution given only noisy samples of a fixed variance. We extend our algorithm to account for heterogeneous data where each training sample has a different noise level. The underlying mathematical tools can be generalized to linear measurements with the potential of accelerating MRI. Our method has deep connections to the literature on learning supervised models from corrupted data, such as SURE and Noise2X. Our framework opens exciting possibilities for generative modeling in data-constrained scientific applications. We are actively working on applying this to denoise proteins and we present some first results in this direction.
A Unifying Framework for Representation Learning
Monday, March 10, 4 pm - 5 pm: Room G882 (Hewlett Room)
shaden@mit.edu
Abstract coming soon.
Aggregating fMRI datasets for training brain-optimized models of human vision
Monday, March 17, 4 pm - 5 pm: Room G882 (Hewlett Room)
blahner@mit.edu
Abstract coming soon.
Algorithm Design with Learned Predictions
Monday, March 31, 4 pm - 5 pm: Room G882 (Hewlett Room)
justc@mit.edu
Abstract coming soon.
Activation-Informed Merging of Large Language Models
Monday, April 7, 4 pm - 5 pm: Room G882 (Hewlett Room)
mrz@mit.edu
Abstract coming soon.
From rewards to responses: Leveraging reward circuits to guide generative circuits
Monday, April 14, 4 pm - 5 pm: Room G882 (Hewlett Room)
arunas@mit.edu
Abstract coming soon.
Do Large Language Model Benchmarks Test Reliability?
Wednesday, April 23, 4 pm - 5 pm: Room G449 (Kiva Room)
jvendrow@mit.edu
Abstract coming soon.
Evaluating multiple models using labeled and unlabeled data
Monday, April 28, 4 pm - 5 pm: Room 370
ssadhuka@mit.edu
Abstract coming soon.
DataS^3: Dataset Subset Selection for Specialization
Monday, May 5, 4 pm - 5 pm: Room G882 (Hewlett Room)
nhulkund@mit.edu
Abstract coming soon.
sharut@mit.edu
vbutoi@mit.edu
bzt@mit.edu
thienle@mit.edu
yifei_w@mit.edu
The Hewlett Seminar Room (32‑G882) is located on the 8th floor of the Gates Tower in the Stata Center (Building 32).