Machine Learning Seminar Series at MIT CSAIL
The ML Tea Talks are meant to be a weekly series of informal 30‑minute talks from members of the machine learning community around MIT. Everyone is welcome to attend to hear about some of the cool ML research being done around here.
We provide Zoom links for all talks through our mailing list. In case you require a link urgently or would like to join our mailing list, kindly reach out via email to one of the organizers listed below.
Please subscribe to the MIT ML mailing list to receive weekly emails containing updates and announcements about seminars.
For Fall'24, we have an exciting lineup of speakers who will be sharing their insights on a variety of topics in machine learning and AI. We look forward to seeing you there!
Venue: Hewlett Seminar Room, 32‑G882
Time: Every Monday at 4 PM (unless otherwise specified)
Activation Steering in Generative Settings via Contrastive Causal Mediation Analysis
Monday, September 15, 4PM - 5PM: Room G882 (Hewlett Room)
arunas@mit.edu
Abstract coming soon.
Consensus-Driven Active Model Selection
Monday, September 15, 4PM - 5PM: Room G882 (Hewlett Room)
kayj@mit.edu
Abstract coming soon.
Bridging machine learning and optimization with computational metabolomics
Monday, September 22, 4PM - 5PM: Room G882 (Hewlett Room)
runzhong@mit.edu
Abstract coming soon.
Collapse-Proof Non-Contrastive Self-Supervised Learning
Monday, September 29, 4PM - 5PM: Room G882 (Hewlett Room)
esansone@mit.edu
Abstract coming soon.
Data Attribution in High Dimensions and without Strong Convexity
Monday, September 29, 4PM - 5PM: Room G882 (Hewlett Room)
ittair@mit.edu
Abstract coming soon.
Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions
Monday, October 6, 4PM - 5PM: Room G882 (Hewlett Room)
sourav99@mit.edu
Abstract coming soon.
Natural-Formal Interoperable Programming with Shared Program State
Wednesday, October 15, 4PM - 5PM: Room G449 (Kiva Room)
ellieyhc@mit.edu
Abstract coming soon.
Context-aware sequence-to-function model of human gene regulation
Wednesday, October 15, 4PM - 5PM: Room G449 (Kiva Room)
aksu@mit.edu
Abstract coming soon.
Pandemic-Potential Viruses are a Blind Spot for Frontier Open-Source LLMs
Monday, October 20, 4PM - 5PM: Room G882 (Hewlett Room)
luebbert@broadinstitute.org
Abstract coming soon.
Chain-of-Thought Degrades Abstention in LLMs, Unless Inverted
Monday, October 20, 4PM - 5PM: Room G882 (Hewlett Room)
abinitha@mit.edu
Abstract coming soon.
RL's Razor: Why On-Policy Reinforcement Learning Forgets Less
Monday, October 27, 4PM - 5PM: Room G882 (Hewlett Room)
idanshen@mit.edu
Abstract coming soon.
PDDL-Instruct: Enhancing Symbolic Planning Capabilities in LLMs through Logical Chain-of-Thought Instruction Tuning
Monday, November 3, 4PM - 5PM: Room G882 (Hewlett Room)
pulkitv@mit.edu
Abstract coming soon.
Incentive-Aware Dynamic Pricing for Constrained Resource Allocation with Strategic Agents
Monday, November 3, 4PM - 5PM: Room G882 (Hewlett Room)
yandai20@mit.edu
Abstract coming soon.
Foundational Neuro-symbolic Models for Reasoning and Planning
Wednesday, November 12, 4PM - 5PM: Room G449 (Kiva Room)
ycchen98@mit.edu
Abstract coming soon.
Uncovering Confident Failures: The Complementary Roles of Aleatoric and Epistemic Uncertainty in LLMs
Monday, November 17, 4PM - 5PM: Room G882 (Hewlett Room)
hamidieh@mit.edu
Abstract coming soon.
Blanket unlearning without an erase-set
Monday, November 17, 4PM - 5PM: Room G882 (Hewlett Room)
adrianoh@mit.edu
Abstract coming soon.
Safely Open-Sourcing Foundation Models
Monday, November 24, 4PM - 5PM: Room G882 (Hewlett Room)
vinithms@mit.edu
Abstract coming soon.
Theoretical Perspectives on Data Quality and Selection
Wednesday, February 19, 4 pm - 5 pm: Room G449 (Kiva Room)
abhishekshettymit@gmail.com
Though the fact that data quality directly affects the quality of our prediction has always been understood, the large scale data requirements of modern machine learning tasks has brought to fore the need to develop a richer vocabulary for understanding the quality of collected data towards predictions tasks of interest and the need to develop algorithms that most effectively use collected data. Though, this has been studied in various contexts such as distribution shift, multitask learning and sequential decision making, there remains a need to develop techniques to address problems faced in practice. Towards this aim of starting a dialogue between the practical and theoretical perspectives on these important problems. I will survey some recent techniques developed in TCS and statistics addressing data quality and selection.
ScoreMix: One-Step Generative Model Training via Score Estimation of Mixture Distributions
Monday, February 24, 4 pm-5 pm: Room G882 (Hewlett Room)
tejasj@mit.edu
We propose Score-of-Mixture Training (SMT), a novel framework for training one-step generative models by minimizing a class of divergences called the α-skew Jensen–Shannon divergence. At its core, SMT estimates the score of mixture distributions between real and fake samples across multiple noise levels. Similar to consistency models, our approach supports both training from scratch (SMT) and distillation using a pretrained diffusion model, which we call Score-of-Mixture Distillation (SMD). It is simple to implement, requires minimal hyperparameter tuning, and ensures stable training. Experiments on CIFAR-10 and ImageNet 64×64 show that SMT/SMD are competitive with and can even outperform existing methods.
Learning Generative Models from Corrupted Data
Monday, March 3, 4 pm - 5 pm: Room G882 (Hewlett Room)
gdaras@mit.edu
In scientific applications, generative models are used to regularize solutions to inverse problems. The quality of the models depends on the quality of the data on which they are trained. While natural images are abundant, in scientific applications access to high-quality data is scarce, expensive, or even impossible. For example, in MRI the quality of the scan is proportional to the time spent in the scanner and in black-hole imaging, we can only access lossy measurements. Contrary to high-quality data, noisy samples are generally more accessible. If we had a method to transform noisy points into clean ones, e.g., by sampling from the posterior, we could address these challenges. A standard approach would be to use a pre-trained generative model as a prior. But how can we train these priors in the first place without having access to data? We show that one can escape this chicken-egg problem using diffusion-based algorithms that account for the corruption at training time. We present the first algorithm that provably recovers the distribution given only noisy samples of a fixed variance. We extend our algorithm to account for heterogeneous data where each training sample has a different noise level. The underlying mathematical tools can be generalized to linear measurements with the potential of accelerating MRI. Our method has deep connections to the literature on learning supervised models from corrupted data, such as SURE and Noise2X. Our framework opens exciting possibilities for generative modeling in data-constrained scientific applications. We are actively working on applying this to denoise proteins and we present some first results in this direction.
Unsupervised Discovery of Interpretable Structure in Complex Systems
Monday, March 10, 4 pm - 5 pm: Room G882 (Hewlett Room)
markth@mit.edu
How does the human mind make sense of raw information without being taught how to see or hear? In this talk we will explore how to build algorithms that can uncover interpretable structure from large collections of unsupervised data like images and video. First, I will describe how to classify every pixel of a collection of images without any human annotations (Unsupervised semantic segmentation) by distilling self-supervised vision models. Second, we'll see how this basic idea leads us to a new unifying theory of representation learning, and I will show how 20 different common machine learning methods such as dimensionality reduction, clustering, contrastive learning, and spectral methods emerge from a single unified equation. Finally, we'll use this unified theory to create algorithms that can decode natural language just by watching unlabeled videos of people talking, without any knowledge of text. This work is the first step in our broader effort to translate animals using large scale, unsupervised, and interpretable learners, and the talk will conclude with some of our most recent efforts to analyze the complex vocalizations of Atlantic spotted dolphins.
Aggregating fMRI datasets for training brain-optimized models of human vision
Monday, March 17, 4 pm - 5 pm: Room G882 (Hewlett Room)
blahner@mit.edu
Large-scale fMRI datasets are revolutionizing our understanding of the neural processes underlying human perception, driving new breakthroughs in neuroscience and computational modeling. Yet individual fMRI data collection efforts remain constrained by practical limitations in scan time, creating an inherent tradeoff between subjects, stimuli, and stimulus repetitions. This tradeoff often compromises stimuli diversity, data quality, and generalizability of findings such that even the largest fMRI datasets cannot fully leverage the power of high-parameter artificial neural network models and high-dimensional feature spaces. To overcome these challenges, we introduce MOSAIC (Meta-Organized Stimuli And fMRI Imaging data for Computational modeling): a scalable framework for aggregating fMRI responses across multiple subjects and datasets. We preprocessed and registered eight event-related fMRI vision datasets (Natural Scenes Dataset, Natural Object Dataset, BOLD Moments Dataset, BOLD5000, Human Actions Dataset, Deeprecon, Generic Object Decoding, and THINGS) to the fsLR32k cortical surface space with fMRIPrep to obtain 430,007 fMRI-stimulus pairs over 93 subjects and 162,839 unique stimuli. We estimated single-trial beta values with GLMsingle (Prince et al., 2022), obtaining parameter estimates of similar or higher quality than the originally published datasets. Critically, we curated the dataset by eliminating stimuli with perceptual similarity above a defined threshold to prevent test-train leakage. This rigorous pipeline resulted in a well-defined stimulus-response dataset with 144,360 training stimuli, 18,145 test stimuli, and 334 synthetic stimuli well-suited for building and evaluating robust models of human vision. We show preliminary results using MOSAIC to investigate how the internal representations between brain-optimized neural networks differ from task-optimized neural networks and perform a large-scale decoding analysis that highlights the importance of stimulus set diversity. This framework empowers the vision science community to collaboratively generate a scalable, generalizable foundation for studying human vision.
Activation-Informed Merging of Large Language Models
Monday, April 7, 4 pm - 5 pm: Room G882 (Hewlett Room)
mrz@mit.edu
Model merging has emerged as an efficient strategy for combining multiple fine-tuned large language models (LLMs) while avoiding the computational overhead of retraining. However, existing methods often overlook the importance of activation-space information in guiding the merging process. In this talk, I will introduce Activation-Informed Merging (AIM), a novel technique that enhances the robustness and performance of merged models by incorporating activation-space insights. AIM is designed as a complementary framework that can be applied to any merging approach, preserving critical weights from the base model through principles drawn from continual learning and model compression. By utilizing a task-agnostic calibration set, AIM selectively prioritizes essential parameters, leading to significant performance improvements across multiple benchmarks, with up to a 40% increase in effectiveness.
Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Monday, April 14, 4 pm - 5 pm: Room G882 (Hewlett Room)
tianjin@mit.edu, ellieyhc@mit.edu
Decoding with autoregressive large language models (LLMs) traditionally occurs sequentially, generating one token after another. An emerging line of work explored parallel decoding by identifying and simultaneously generating semantically independent chunks of LLM responses. However, these techniques rely on hand-crafted heuristics tied to syntactic structures like lists and paragraphs, making them rigid and imprecise. We present PASTA, a learning-based system that teachers LLMs to identify semantic independence and express parallel decoding opportunities in their own responses. At its core are the PASTA-LANG and its interpreter: PASTA-LANG is an annotation language that allows LLMs to express semantic independence in their own responses; the language interpreter acts on these annotations to orchestrate on-the-fly at inference time. Through a two-stage finetuning process, we train LLMs to generate PASTA-LANG annotations that optimize both response quality and decoding speed. Evaluation on AlpacaEval, an instruction following benchmark, shows that our approach Pareto-dominates existing methods in terms of decoding speed and response quality; our results demonstrate geometric mean speedups ranging from 1.21× to 1.93× with corresponding quality changes of +2.2% to -7.1%, measured as in length-controlled win rates.
Do Large Language Model Benchmarks Test Reliability?
Wednesday, April 23, 4 pm - 5 pm: Room G449 (Kiva Room)
jvendrow@mit.edu, evendrow@mit.edu
When deploying large language models (LLMs), it is important to ensure that these models are not only capable, but also reliable. Many benchmarks have been created to track LLMs' growing capabilities, however there has been no similar focus on measuring their reliability. To understand the potential ramifications of this gap, we investigate how well current benchmarks quantify model reliability. We find that pervasive label errors can compromise these evaluations, obscuring lingering model failures and hiding unreliable behavior. Motivated by this gap in the evaluation of reliability, we then propose the concept of so-called platinum benchmarks, i.e., benchmarks carefully curated to minimize label errors and ambiguity. As a first attempt at constructing such benchmarks, we revise examples from fifteen existing popular benchmarks. We evaluate a wide range of models on these platinum benchmarks and find that, indeed, frontier LLMs still exhibit failures on simple tasks such as elementary-level math word problems. Analyzing these failures further reveals previously unidentified patterns of problems on which frontier models consistently struggle.
Evaluating multiple models using labeled and unlabeled data
Monday, April 28, 4 pm - 5 pm: Room 370
ssadhuka@mit.edu
Abstract coming soon.
Algorithm Design with Learned Predictions
Monday, May 5, 4 pm - 5 pm: Room G882 (Hewlett Room)
justc@mit.edu
Abstract coming soon.
sharut@mit.edu
vbutoi@mit.edu
chinglam@mit.edu
bzt@mit.edu
thienle@mit.edu
yifei_w@mit.edu
The Hewlett Seminar Room (32‑G882) is located on the 8th floor of the Gates Tower in the Stata Center (Building 32).