ML Tea: A seminar series hosted out of MIT CSAIL

The ML Tea Talks are meant to be a weekly series of informal 30-minute-long talks from members of the machine learning community around MIT. Everyone is welcome to attend to hear about some of the cool ML research being done around here. 

For Fall 2024, our talks will be held weekly on every Monday usually at 4 PM. The venue of the majority of talks will be the Hewlett Seminar Room, 32-G882 but is subject to change. Please follow the upcoming announcements through the mailing list for the most up to date information. 

We provide Zoom links for all talks through our mailing list. In case you require a link urgently or would like to join our mailing list, kindly reach out via email to one of the organizers listed below.

Please subscribe to the MIT ML mailing list  to receive weekly emails containing updates and announcements about seminars.

Organizers

Sharut Gupta

MIT CSAIL

sharut@mit.edu

Thien Le

MIT CSAIL

thienle@mit.edu

Behrooz Tahmasebi

MIT CSAIL

bzt@mit.edu

Yifei Wang

MIT CSAIL

yifei_w@mit.edu

Directions to the Room

The Hewlett Seminar room (32-G882) is located on the 8th floor of the Gates Tower within the Stata Center (Building 32). Upon entering Building 32 through the front entrance, proceed straight ahead. You'll find elevators on your right-hand side. Take these elevators up to the 8th floor. Upon exiting the elevators, turn left, and the Hewlett Seminar Room will be directly ahead.

Fall 2024 Speakers

Coming soon!


Spring 2024 Speakers

Decomposing Predictions by Modeling Model Computation

Thursday, May 2, 2024 at 4 PM (Room 32-370)

Speaker: Harshay Shah (MIT CSAIL)

How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question. The goal of component modeling is to decompose an ML model's prediction in terms of its components -- simple functions (e.g., convolution filters, attention heads) that are the "building blocks" of model computation. We focus on a special case of this task, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions; we demonstrate its effectiveness across models, datasets, and modalities. Finally, we show that component attributions estimated with COAR directly enable model editing across five tasks, namely: fixing model errors, ``forgetting'' specific classes, boosting subpopulation robustness, localizing backdoor attacks, and improving robustness to typographic attacks. 

Harshay Shah

Harshay is a PhD student at MIT CSAIL, advised by Aleksander Madry. His research interests are broadly in developing tools to understand and steer model behavior. Recently, he has been working on understanding how training data and learning algorithms collectively shape neural network representations. 

Ablation Based Counterfactuals

Thursday, April 25, 2024 at 4 PM (Room 32-370)

Speaker: Zheng Dai (MIT CSAIL)

The widespread adoption of diffusion models for creative uses such as image, video, and audio synthesis has raised serious questions surrounding the use of training data and its regulation. To arrive at a resolution, it is important to understand how such models are influenced by their training data. Due to the complexity involved in training and sampling from these models, the ultimate impact of the training data is challenging to characterize, confounding regulatory and scientific efforts. In this work we explore the idea of an Ablation Based Counterfactual, which allows us to compute counterfactual scenarios where training data is missing by ablating parts of a model, circumventing the need to retrain. This enables important downstream tasks such as data attribution, and brings us closer to understanding the influence of training data on these models.

Zheng is a fifth year PhD candidate advised by David Gifford. He is broadly interested in the safe and ethical implementation of AI, and his projects include ML aided discovery of novel therapeutics, theoretical foundations of robust classifiers, and explainability in generative models.


Improving data efficiency and accessibility for general robotic manipulation

Thursday, April 18, 2024 at 4 PM (Room 32-370)

Speaker: Hao-Shu Fang (MIT CSAIL)


How can data-driven approaches endow robots with diverse manipulative skills and robust performance in unstructured environments? Despite recent progress, many open questions remain in this area, such as: (1) How can we define and model the data distribution for robotic systems? (2) In light of data scarcity, what strategies can algorithms employ to enhance performance? (3) What is the best way to scale up robotic data collection? In this talk, Hao-Shu Fang will share his research on enhancing the efficiency of robot learning algorithms and democratizing access to large-scale robotic manipulation data. He will also discuss several open questions in data-driven robotic manipulation, offering insights to the challenges posed.

Hao-Shu Fang

Hao-Shu Fang is a postdoctoral researcher collaborating with Pulkit Agrawal and Edward Adelson. His research focuses on general robotic manipulation. Recently, he has been investigating how to integrate visual-tactile perception for improved manipulation and how to train a multi-task robotic foundation behavioral mode

Removing Biases from Molecular Representations via Information Maximization

Thursday, April 11, 2024 at 4 PM (Room 32-370)

Speaker: Chenyu Wand (MIT CSAIL)


 High-throughput drug screening – using cell imaging or gene expression measurements as readouts of drug effect – is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE’s superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.

Chenyu Wang

Chenyu is a second-year PhD student at MIT EECS, advised by Tommi Jaakkola and Caroline Uhler. She is also affiliated with Eric and Wendy Schmidt Center (EWSC) at Broad Institute. Her research interests lie broadly in machine learning, representation learning, and AI for science. Recently her research focuses on multi-modal representation learning and perturbation modelling for drug discovery. Before her PhD, she obtained my Bachelor’s degree from Tsinghua University.

Interpolating Item and User Fairness in Multi-Sided Recommendations

Thursday, April 4, 2024 at 4 PM (Room 32-370)

Speaker: Qinyi Chen (MIT ORC)


Today's online platforms rely heavily on algorithmic recommendations to bolster user engagement and drive revenue. However, such algorithmic recommendations can impact diverse stakeholders involved, namely the platform, items (seller), and users (customers), each with their unique objectives. In such multi-sided platforms, finding an appropriate middle ground becomes a complex operational challenge. Motivated by this, we formulate a novel fair recommendation framework, called Problem (FAIR), that not only maximizes the platform's revenue, but also accommodates varying fairness considerations from the perspectives of items and users. Our framework's distinguishing trait lies in its flexibility - it allows the platform to specify any definitions of item/user fairness that are deemed appropriate, as well as decide the "price of fairness" it is willing to pay to ensure fairness for other stakeholders. We further examine Problem (FAIR) in a dynamic online setting, where the platform needs to learn user data and generate fair recommendations simultaneously in real time, which are two tasks that are often at odds. In face of this additional challenge, we devise a low-regret online recommendation algorithm, called FORM, that effectively balances the act of learning and performing fair recommendation. Our theoretical analysis confirms that FORM proficiently maintains the platform's revenue, while ensuring desired levels of fairness for both items and users. Finally, we demonstrate the efficacy of our framework and method via several case studies on real-world data.

Qinyi Chen

Qinyi Chen is a fourth-year PhD student in the Operations Research Center (ORC) at MIT, advised by Prof. Negin Golrezaei. Her research interests span machine learning and optimization, AI/ML fairness, approximation algorithms, game and auction theory, with applications in digital platforms and marketplaces.

When is Agnostic Reinforcement Learning Statistically Tractable?

Thursday, March 21, 2024 at 4 PM (Room 32-370)

Speaker: Zeyu Jia (MIT LIDS)


We study the problem of agnostic PAC reinforcement learning (RL): given a policy class Π, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an ε-suboptimal policy with respect to Π? Towards that end, we introduce a new complexity measure, called the spanning capacity, that depends solely on the set Π and is independent of the MDP dynamics. With a generative model, we show that for any policy class Π, bounded spanning capacity characterizes PAC learnability. However, for online RL, the situation is more subtle. We show there exists a policy class Π with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional sunflower structure, which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as techniques for reachable-state identification and policy evaluation in reward-free exploration.

Zeyu Jia

Zeyu Jia is a fourth-year PhD student at Department of EECS at MIT, advised by Alexander Rakhlin and Yury Polyanskiy. He is also affiliated to Laboratory for Information and Decision System (LIDS) at MIT. Prior to joining MIT, he received bachelor’s degrees in Peking University, School of Mathematical Science in 2020. His research interests include machine learning theory especially reinforcement learning theory, statistics and information theory.

What's the Erdős number of an LLM? Mathematical and algorithmic discovery via machine learning

Thursday, March 14, 2024 at 4 PM (Room 32-370)

Skeaper: Peter Holderrieth (MIT CSAIL)


 We survey methods for discovering novel mathematics and novel algorithms via machine learning (AlphaTensor, FunSearch, AlphaGeometry, AI Feynman etc.). We won't present our own work but rather other people's works. So, this is a review in form a presentation.

Peter E. Holderrieth

Human Expertise in Algorithmic Prediction

Thursday, March 7, 2024 at 4 PM (Room 32-370)

Speaker: Rohan Alur (MIT LIDS)


We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach focuses on the use of human judgment to distinguish inputs which `look the same' to any feasible predictive algorithm. We argue that this framing clarifies the problem of human/AI collaboration in prediction tasks, as experts often have access to information -- particularly subjective information -- which is not encoded in the algorithm's training data. We use this insight to develop a set of principled algorithms for selectively incorporating human feedback only when it improves the performance of any feasible predictor. We find empirically that although algorithms often outperform their human counterparts on average, human judgment can significantly improve algorithmic predictions on specific instances (which can be identified ex-ante). In an X-ray classification task, we find that this subset constitutes nearly 30% of the patient population. Our approach provides a natural way of uncovering this heterogeneity and thus enabling effective human-AI collaboration.

Rohan Alur

Rohan is a second year PhD student in EECS, where he is advised by Manish Raghavan and Devavrat Shah. His research interests are at the intersection of machine learning and economics, with a particular focus on causal inference, human/AI collaboration and data-driven decision making.

Context is Environment

Thursday, February 29, 2024 at 5 PM (Patil/Kiva Seminar Room, 32-G449)

Speaker: Sharut Gupta (MIT CSAIL)


Two lines of work are taking the central stage in AI research. On the one hand, the community is making increasing efforts to build models that discard spurious correlations and generalize better in novel test environments. Unfortunately, the hard lesson so far is that no proposal convincingly outperforms a simple empirical risk minimization baseline. On the other hand, large language models (LLMs) have erupted as algorithms able to learn in-context, generalizing on-the-fly to eclectic contextual circumstances that users enforce by means of prompting. In this paper, we argue that context is environment, and posit that in-context learning holds the key to better domain generalization. Via extensive theory and experiments, we show that paying attention to context--unlabeled examples as they arrive--allows our proposed In-Context Risk Minimization (ICRM) algorithm to zoom-in on the test environment risk minimizer, leading to significant out-of-distribution performance improvements. From all of this, two messages are worth taking home. Researchers in domain generalization should consider environment as context, and harness the adaptive power of in-context learning. Researchers in LLMs should consider context as environment, to better structure data towards generalization.

Sharut Gupta

Sharut Gupta is a second-year Ph.D. student at MIT CSAIL, working with Prof. Stefanie Jegelka. Her research mainly focuses on building robust and generalizable machine learning systems under minimal supervision. She enjoys working on out-of-distribution generalization, self-supervised learning, causal inference, and representation learning.

Ask Your Distribution Shift if Pre-Training is Right for You

Thursday, February 22, 2024 at 5 PM (Patil/Kiva Seminar Room, 32-G449)

Speaker: Benjamin Cohen-Wang (MIT CSAIL)


Pre-training is a widely used approach to develop models that are robust to distribution shifts. However, in practice, its effectiveness varies: fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others (compared to training from scratch). In this work, we seek to characterize the failure modes that pre-training can and cannot address. In particular, we focus on two possible failure modes of models under distribution shift: poor extrapolation (e.g., they cannot generalize to a different domain) and biases in the training data (e.g., they rely on spurious features). Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases. After providing theoretical motivation and empirical evidence for this finding, we explore two of its implications for developing robust models: (1) pre-training and interventions designed to prevent exploiting biases have complementary robust- ness benefits, and (2) fine-tuning on a (very) small, non-diverse but de-biased dataset can result in significantly more robust models than fine-tuning on a large and diverse but biased dataset.

Benjamin Cohen-Wang

Ben is a second year PhD student at MIT where he is advised by Aleksander Madry. He is interested in how we can develop machine learning models that can be safely deployed, with a focus on robustness to distribution shifts. Lately, he has been working on understanding how we can harness large-scale pre-training (e.g., CLIP, GPT) to develop robust task-specific models.

Efficiently Searching for Distributions

Thursday, February 15, 2024 at 4 PM (Patil/Kiva Seminar Room, 32-G449)

Speaker: Sandeep Silwal (MIT CSAIL)


How efficiently can we search distributions? The problem is modeled as follows: we are given knowledge of k discrete distributions v_i for 1 <= i <= k over the domain [n] = {1,...,n} which we can preprocess. Then we get samples from an unknown discrete distribution p, also over [n]. The goal is to output the closest distribution to p among the v_i's in TV distance (up to some small additive error). State of the art sample efficient algorithms require Theta(log k) samples and run in near linear time.

We introduce a fresh perspective on the problem and ask if we can output the closest distribution in *sublinear* time. This question is particularly motivated as it is a generalization of the traditional nearest neighbor search problem: if we take enough samples, we can learn p explicitly up to low TV distance, and then find the closest v_i in o(k) time using standard nearest neighbor search. However, this approach requires Omega(n) samples. 

Thus, it is natural to ask: can we obtain both sublinear number of samples and sublinear query time? We present some nice progress towards this question and uncover a very interesting statistical-computational trade-off.

This is joint work with Anders Aamand, Alex Andoni, Justin Chen, Piotr Indyk, Shyam Narayanan, and Haike Xu.

Sandeep Silwal

Sandeep is a final year PhD student at MIT, advised by Piotr Indyk. His interests are broadly in fast algorithm design. Recently, he has been working in the intersection of machine learning and classical algorithms by designing provable algorithms in various ML settings, such as efficient algorithms for processing large datasets, as well as using ML to inspire algorithm design.