ML Tea Talks

Machine Learning Seminar Series at MIT CSAIL

About ML Tea Talks

The ML Tea Talks are meant to be a weekly series of informal 30‑minute talks from members of the machine learning community around MIT. Everyone is welcome to attend to hear about some of the cool ML research being done around here.

We provide Zoom links for all talks through our mailing list. In case you require a link urgently or would like to join our mailing list, kindly reach out via email to one of the organizers listed below. Please subscribe to the MIT ML mailing list to receive weekly emails containing updates and announcements about seminars.

For Spring'25, we have an exciting lineup of speakers who will be sharing their insights on a variety of topics in machine learning and AI. We look forward to seeing you there!

Venue: Hewlett Seminar Room, 32‑G882
Time: Every Monday at 4 PM (unless otherwise specified)

Upcoming Speakers (Spring 2024)

Abhishek Shetty

Theoretical Perspectives on Data Quality and Selection

Wednesday, February 19, 4 pm - 5 pm: Room G449 (Kiva Room)

abhishekshettymit@gmail.com

Though the fact that data quality directly affects the quality of our prediction has always been understood, the large scale data requirements of modern machine learning tasks has brought to fore the need to develop a richer vocabulary for understanding the quality of collected data towards predictions tasks of interest and the need to develop algorithms that most effectively use collected data. Though, this has been studied in various contexts such as distribution shift, multitask learning and sequential decision making, there remains a need to develop techniques to address problems faced in practice. Towards this aim of starting a dialogue between the practical and theoretical perspectives on these important problems. I will survey some recent techniques developed in TCS and statistics addressing data quality and selection.

Tejas Jayashankar

ScoreMix: One-Step Generative Model Training via Score Estimation of Mixture Distributions

Monday, February 24, 4 pm-5 pm: Room G882 (Hewlett Room)

tejasj@mit.edu

Abstract coming soon.

Giannis Daras

Learning Generative Models from Corrupted Data

Monday, March 3, 4 pm - 5 pm: Room G882 (Hewlett Room)

gdaras@mit.edu

In scientific applications, generative models are used to regularize solutions to inverse problems. The quality of the models depends on the quality of the data on which they are trained. While natural images are abundant, in scientific applications access to high-quality data is scarce, expensive, or even impossible. For example, in MRI the quality of the scan is proportional to the time spent in the scanner and in black-hole imaging, we can only access lossy measurements. Contrary to high-quality data, noisy samples are generally more accessible. If we had a method to transform noisy points into clean ones, e.g., by sampling from the posterior, we could address these challenges. A standard approach would be to use a pre-trained generative model as a prior. But how can we train these priors in the first place without having access to data? We show that one can escape this chicken-egg problem using diffusion-based algorithms that account for the corruption at training time. We present the first algorithm that provably recovers the distribution given only noisy samples of a fixed variance. We extend our algorithm to account for heterogeneous data where each training sample has a different noise level. The underlying mathematical tools can be generalized to linear measurements with the potential of accelerating MRI. Our method has deep connections to the literature on learning supervised models from corrupted data, such as SURE and Noise2X. Our framework opens exciting possibilities for generative modeling in data-constrained scientific applications. We are actively working on applying this to denoise proteins and we present some first results in this direction.

Shaden Alshammari

A Unifying Framework for Representation Learning

Monday, March 10, 4 pm - 5 pm: Room G882 (Hewlett Room)

shaden@mit.edu

Abstract coming soon.

Benjamin Lahner

Aggregating fMRI datasets for training brain-optimized models of human vision

Monday, March 17, 4 pm - 5 pm: Room G882 (Hewlett Room)

blahner@mit.edu

Abstract coming soon.

Justin Chen

Algorithm Design with Learned Predictions

Monday, March 31, 4 pm - 5 pm: Room G882 (Hewlett Room)

justc@mit.edu

Abstract coming soon.

Kaveh Alimohammadi

Activation-Informed Merging of Large Language Models

Monday, April 7, 4 pm - 5 pm: Room G882 (Hewlett Room)

mrz@mit.edu

Abstract coming soon.

Aruna Sankaranarayanan

From rewards to responses: Leveraging reward circuits to guide generative circuits

Monday, April 14, 4 pm - 5 pm: Room G882 (Hewlett Room)

arunas@mit.edu

Abstract coming soon.

Josh Vendrow

Do Large Language Model Benchmarks Test Reliability?

Wednesday, April 23, 4 pm - 5 pm: Room G449 (Kiva Room)

jvendrow@mit.edu

Abstract coming soon.

Shuvom Sadhuka

Evaluating multiple models using labeled and unlabeled data

Monday, April 28, 4 pm - 5 pm: Room 370

ssadhuka@mit.edu

Abstract coming soon.

Neha Hulkund

DataS^3: Dataset Subset Selection for Specialization

Monday, May 5, 4 pm - 5 pm: Room G882 (Hewlett Room)

nhulkund@mit.edu

Abstract coming soon.

Organizers

Organizer Sharut Gupta

Sharut Gupta

sharut@mit.edu

Organizer Victor Butoi

Victor Butoi

vbutoi@mit.edu

Past Organizers

Past Organizer 1

Behrooz Tahmasebi

bzt@mit.edu

Past Organizer 2

Thien Le

thienle@mit.edu

Past Organizer 3

Yifei Wang

yifei_w@mit.edu

Directions

The Hewlett Seminar Room (32‑G882) is located on the 8th floor of the Gates Tower in the Stata Center (Building 32).