ML Tea Talks

Upcoming Speakers (Fall 2024)

Aruna Sankaranarayanan

Activation Steering in Generative Settings via Contrastive Causal Mediation Analysis

Monday, September 15, 4PM - 5PM: Room G882 (Hewlett Room)

arunas@mit.edu

Abstract coming soon.

Justin Kay

Consensus-Driven Active Model Selection

Monday, September 15, 4PM - 5PM: Room G882 (Hewlett Room)

kayj@mit.edu

Abstract coming soon.

Runzhong Wang

Bridging machine learning and optimization with computational metabolomics

Monday, September 22, 4PM - 5PM: Room G882 (Hewlett Room)

runzhong@mit.edu

Abstract coming soon.

Emanuele Sansone

Collapse-Proof Non-Contrastive Self-Supervised Learning

Monday, September 29, 4PM - 5PM: Room G882 (Hewlett Room)

esansone@mit.edu

Abstract coming soon.

Ittai Rubinstein

Data Attribution in High Dimensions and without Strong Convexity

Monday, September 29, 4PM - 5PM: Room G882 (Hewlett Room)

ittair@mit.edu

Abstract coming soon.

Sourav Sahoo

Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions

Monday, October 6, 4PM - 5PM: Room G882 (Hewlett Room)

sourav99@mit.edu

Abstract coming soon.

Ellie Cheng

Natural-Formal Interoperable Programming with Shared Program State

Wednesday, October 15, 4PM - 5PM: Room G449 (Kiva Room)

ellieyhc@mit.edu

Abstract coming soon.

Ekin Deniz Aksu

Context-aware sequence-to-function model of human gene regulation

Wednesday, October 15, 4PM - 5PM: Room G449 (Kiva Room)

aksu@mit.edu

Abstract coming soon.

Laura Luebbert

Pandemic-Potential Viruses are a Blind Spot for Frontier Open-Source LLMs

Monday, October 20, 4PM - 5PM: Room G882 (Hewlett Room)

luebbert@broadinstitute.org

Abstract coming soon.

Abinitha Gourabathina

Chain-of-Thought Degrades Abstention in LLMs, Unless Inverted

Monday, October 20, 4PM - 5PM: Room G882 (Hewlett Room)

abinitha@mit.edu

Abstract coming soon.

Idan Shenfeld

RL's Razor: Why On-Policy Reinforcement Learning Forgets Less

Monday, October 27, 4PM - 5PM: Room G882 (Hewlett Room)

idanshen@mit.edu

Abstract coming soon.

Pulkit Verma

PDDL-Instruct: Enhancing Symbolic Planning Capabilities in LLMs through Logical Chain-of-Thought Instruction Tuning

Monday, November 3, 4PM - 5PM: Room G882 (Hewlett Room)

pulkitv@mit.edu

Abstract coming soon.

Yan Dai

Incentive-Aware Dynamic Pricing for Constrained Resource Allocation with Strategic Agents

Monday, November 3, 4PM - 5PM: Room G882 (Hewlett Room)

yandai20@mit.edu

Abstract coming soon.

Yongchao Chen

Foundational Neuro-symbolic Models for Reasoning and Planning

Wednesday, November 12, 4PM - 5PM: Room G449 (Kiva Room)

ycchen98@mit.edu

Abstract coming soon.

Kimia Hamidieh

Uncovering Confident Failures: The Complementary Roles of Aleatoric and Epistemic Uncertainty in LLMs

Monday, November 17, 4PM - 5PM: Room G882 (Hewlett Room)

hamidieh@mit.edu

Abstract coming soon.

Adriano Hernandez

Blanket unlearning without an erase-set

Monday, November 17, 4PM - 5PM: Room G882 (Hewlett Room)

adrianoh@mit.edu

Abstract coming soon.

Vinith Suriyakumar

Safely Open-Sourcing Foundation Models

Monday, November 24, 4PM - 5PM: Room G882 (Hewlett Room)

vinithms@mit.edu

Abstract coming soon.

Archived Speakers (Spring 2024)

Abhishek Shetty

Theoretical Perspectives on Data Quality and Selection

Wednesday, February 19, 4 pm - 5 pm: Room G449 (Kiva Room)

abhishekshettymit@gmail.com

Though the fact that data quality directly affects the quality of our prediction has always been understood, the large scale data requirements of modern machine learning tasks has brought to fore the need to develop a richer vocabulary for understanding the quality of collected data towards predictions tasks of interest and the need to develop algorithms that most effectively use collected data. Though, this has been studied in various contexts such as distribution shift, multitask learning and sequential decision making, there remains a need to develop techniques to address problems faced in practice. Towards this aim of starting a dialogue between the practical and theoretical perspectives on these important problems. I will survey some recent techniques developed in TCS and statistics addressing data quality and selection.

Tejas Jayashankar

ScoreMix: One-Step Generative Model Training via Score Estimation of Mixture Distributions

Monday, February 24, 4 pm-5 pm: Room G882 (Hewlett Room)

tejasj@mit.edu

We propose Score-of-Mixture Training (SMT), a novel framework for training one-step generative models by minimizing a class of divergences called the α-skew Jensen–Shannon divergence. At its core, SMT estimates the score of mixture distributions between real and fake samples across multiple noise levels. Similar to consistency models, our approach supports both training from scratch (SMT) and distillation using a pretrained diffusion model, which we call Score-of-Mixture Distillation (SMD). It is simple to implement, requires minimal hyperparameter tuning, and ensures stable training. Experiments on CIFAR-10 and ImageNet 64×64 show that SMT/SMD are competitive with and can even outperform existing methods.

Giannis Daras

Learning Generative Models from Corrupted Data

Monday, March 3, 4 pm - 5 pm: Room G882 (Hewlett Room)

gdaras@mit.edu

In scientific applications, generative models are used to regularize solutions to inverse problems. The quality of the models depends on the quality of the data on which they are trained. While natural images are abundant, in scientific applications access to high-quality data is scarce, expensive, or even impossible. For example, in MRI the quality of the scan is proportional to the time spent in the scanner and in black-hole imaging, we can only access lossy measurements. Contrary to high-quality data, noisy samples are generally more accessible. If we had a method to transform noisy points into clean ones, e.g., by sampling from the posterior, we could address these challenges. A standard approach would be to use a pre-trained generative model as a prior. But how can we train these priors in the first place without having access to data? We show that one can escape this chicken-egg problem using diffusion-based algorithms that account for the corruption at training time. We present the first algorithm that provably recovers the distribution given only noisy samples of a fixed variance. We extend our algorithm to account for heterogeneous data where each training sample has a different noise level. The underlying mathematical tools can be generalized to linear measurements with the potential of accelerating MRI. Our method has deep connections to the literature on learning supervised models from corrupted data, such as SURE and Noise2X. Our framework opens exciting possibilities for generative modeling in data-constrained scientific applications. We are actively working on applying this to denoise proteins and we present some first results in this direction.

Mark Hamilton

Unsupervised Discovery of Interpretable Structure in Complex Systems

Monday, March 10, 4 pm - 5 pm: Room G882 (Hewlett Room)

markth@mit.edu

How does the human mind make sense of raw information without being taught how to see or hear? In this talk we will explore how to build algorithms that can uncover interpretable structure from large collections of unsupervised data like images and video. First, I will describe how to classify every pixel of a collection of images without any human annotations (Unsupervised semantic segmentation) by distilling self-supervised vision models. Second, we'll see how this basic idea leads us to a new unifying theory of representation learning, and I will show how 20 different common machine learning methods such as dimensionality reduction, clustering, contrastive learning, and spectral methods emerge from a single unified equation. Finally, we'll use this unified theory to create algorithms that can decode natural language just by watching unlabeled videos of people talking, without any knowledge of text. This work is the first step in our broader effort to translate animals using large scale, unsupervised, and interpretable learners, and the talk will conclude with some of our most recent efforts to analyze the complex vocalizations of Atlantic spotted dolphins.

Benjamin Lahner

Aggregating fMRI datasets for training brain-optimized models of human vision

Monday, March 17, 4 pm - 5 pm: Room G882 (Hewlett Room)

blahner@mit.edu

Large-scale fMRI datasets are revolutionizing our understanding of the neural processes underlying human perception, driving new breakthroughs in neuroscience and computational modeling. Yet individual fMRI data collection efforts remain constrained by practical limitations in scan time, creating an inherent tradeoff between subjects, stimuli, and stimulus repetitions. This tradeoff often compromises stimuli diversity, data quality, and generalizability of findings such that even the largest fMRI datasets cannot fully leverage the power of high-parameter artificial neural network models and high-dimensional feature spaces. To overcome these challenges, we introduce MOSAIC (Meta-Organized Stimuli And fMRI Imaging data for Computational modeling): a scalable framework for aggregating fMRI responses across multiple subjects and datasets. We preprocessed and registered eight event-related fMRI vision datasets (Natural Scenes Dataset, Natural Object Dataset, BOLD Moments Dataset, BOLD5000, Human Actions Dataset, Deeprecon, Generic Object Decoding, and THINGS) to the fsLR32k cortical surface space with fMRIPrep to obtain 430,007 fMRI-stimulus pairs over 93 subjects and 162,839 unique stimuli. We estimated single-trial beta values with GLMsingle (Prince et al., 2022), obtaining parameter estimates of similar or higher quality than the originally published datasets. Critically, we curated the dataset by eliminating stimuli with perceptual similarity above a defined threshold to prevent test-train leakage. This rigorous pipeline resulted in a well-defined stimulus-response dataset with 144,360 training stimuli, 18,145 test stimuli, and 334 synthetic stimuli well-suited for building and evaluating robust models of human vision. We show preliminary results using MOSAIC to investigate how the internal representations between brain-optimized neural networks differ from task-optimized neural networks and perform a large-scale decoding analysis that highlights the importance of stimulus set diversity. This framework empowers the vision science community to collaboratively generate a scalable, generalizable foundation for studying human vision.

Kaveh Alimohammadi

Activation-Informed Merging of Large Language Models

Monday, April 7, 4 pm - 5 pm: Room G882 (Hewlett Room)

mrz@mit.edu

Model merging has emerged as an efficient strategy for combining multiple fine-tuned large language models (LLMs) while avoiding the computational overhead of retraining. However, existing methods often overlook the importance of activation-space information in guiding the merging process. In this talk, I will introduce Activation-Informed Merging (AIM), a novel technique that enhances the robustness and performance of merged models by incorporating activation-space insights. AIM is designed as a complementary framework that can be applied to any merging approach, preserving critical weights from the base model through principles drawn from continual learning and model compression. By utilizing a task-agnostic calibration set, AIM selectively prioritizes essential parameters, leading to significant performance improvements across multiple benchmarks, with up to a 40% increase in effectiveness.

Tian Jin & Ellie Cheng

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

Monday, April 14, 4 pm - 5 pm: Room G882 (Hewlett Room)

tianjin@mit.edu, ellieyhc@mit.edu

Decoding with autoregressive large language models (LLMs) traditionally occurs sequentially, generating one token after another. An emerging line of work explored parallel decoding by identifying and simultaneously generating semantically independent chunks of LLM responses. However, these techniques rely on hand-crafted heuristics tied to syntactic structures like lists and paragraphs, making them rigid and imprecise. We present PASTA, a learning-based system that teachers LLMs to identify semantic independence and express parallel decoding opportunities in their own responses. At its core are the PASTA-LANG and its interpreter: PASTA-LANG is an annotation language that allows LLMs to express semantic independence in their own responses; the language interpreter acts on these annotations to orchestrate on-the-fly at inference time. Through a two-stage finetuning process, we train LLMs to generate PASTA-LANG annotations that optimize both response quality and decoding speed. Evaluation on AlpacaEval, an instruction following benchmark, shows that our approach Pareto-dominates existing methods in terms of decoding speed and response quality; our results demonstrate geometric mean speedups ranging from 1.21× to 1.93× with corresponding quality changes of +2.2% to -7.1%, measured as in length-controlled win rates.

Josh Vendrow & Eddie Vendrow

Do Large Language Model Benchmarks Test Reliability?

Wednesday, April 23, 4 pm - 5 pm: Room G449 (Kiva Room)

jvendrow@mit.edu, evendrow@mit.edu

When deploying large language models (LLMs), it is important to ensure that these models are not only capable, but also reliable. Many benchmarks have been created to track LLMs' growing capabilities, however there has been no similar focus on measuring their reliability. To understand the potential ramifications of this gap, we investigate how well current benchmarks quantify model reliability. We find that pervasive label errors can compromise these evaluations, obscuring lingering model failures and hiding unreliable behavior. Motivated by this gap in the evaluation of reliability, we then propose the concept of so-called platinum benchmarks, i.e., benchmarks carefully curated to minimize label errors and ambiguity. As a first attempt at constructing such benchmarks, we revise examples from fifteen existing popular benchmarks. We evaluate a wide range of models on these platinum benchmarks and find that, indeed, frontier LLMs still exhibit failures on simple tasks such as elementary-level math word problems. Analyzing these failures further reveals previously unidentified patterns of problems on which frontier models consistently struggle.

Shuvom Sadhuka

Evaluating multiple models using labeled and unlabeled data

Monday, April 28, 4 pm - 5 pm: Room 370

ssadhuka@mit.edu

Abstract coming soon.

Justin Chen

Algorithm Design with Learned Predictions

Monday, May 5, 4 pm - 5 pm: Room G882 (Hewlett Room)

justc@mit.edu

Abstract coming soon.

ML Tea Talks

About ML Tea Talks

Upcoming Speakers (Fall 2024)

Aruna Sankaranarayanan

Justin Kay

Runzhong Wang

Emanuele Sansone

Ittai Rubinstein

Sourav Sahoo

Ellie Cheng

Ekin Deniz Aksu

Laura Luebbert

Abinitha Gourabathina

Idan Shenfeld

Pulkit Verma

Yan Dai

Yongchao Chen

Kimia Hamidieh

Adriano Hernandez

Vinith Suriyakumar

Archived Speakers (Spring 2024)

Abhishek Shetty

Tejas Jayashankar

Giannis Daras

Mark Hamilton

Benjamin Lahner

Kaveh Alimohammadi

Tian Jin & Ellie Cheng

Josh Vendrow & Eddie Vendrow

Shuvom Sadhuka

Justin Chen

Organizers

Sharut Gupta

Victor Butoi

Ching Lam Choi

Past Organizers

Behrooz Tahmasebi

Thien Le

Yifei Wang

Directions