About

Interpretability in machine learning revolves around constructing models that are inherently transparent and insightful for human end users. As the scale of machine learning models increases and the range of applications expands across diverse fields, the need for interpretable models is more crucial than ever. The significance of interpretability becomes particularly evident in scenarios where decisions carry substantial real-world consequences, influencing human lives in areas such as healthcare, criminal justice, and lending, where understanding the machine learning process is essential. Interpretability can aid in auditing, verification, debugging, bias detection, ensure safety, and align models more effectively with human intentions. Post-hoc explanations may be unfaithful and thereby unreliable in some applications, which is why it is essential to design inherently interpretable models that provide truthful and complete explanations by default. Motivated by this, researchers have studied interpretability, resulting in a spectrum of distinct approaches.

On one end of the spectrum, classical interpretability methods designed for small-scale and tabular datasets often use rule-based models (e.g., decision trees, risk scores) and linear models (e.g., sparse linear models, generalized linear models) that are deemed inherently transparent. On the other end, modern interpretability methods for large-scale foundation models involve incorporating interpretable components into deep neural networks while not being fully interpretable, spawning novel research areas such as mechanistic interpretability.

In the workshop we aim to connect researchers working on different sub-fields of interpretability, such as rule-based interpretability, attribution-based interpretability, mechanistic interpretability, applied interpretable ML for various domains (e.g. healthcare, earth, material sciences, physics), and AI regulation. We will pose several key questions to foster discussion and insights:

What interpretability approaches are best suited for large-scale models and foundation models?
How to incorporate domain knowledge and expertise when designing interpretable models?
How can we assess the quality and reliability of interpretable models?
How to choose between different interpretable models?
When is it appropriate to use interpretable models or post-hoc explainability methods
What are the inherent limitations of interpretability, and how can we address them?
What are the diverse applications of interpretability across different domains?
What will the future landscape of interpretability entail?
Is there a legal need for interpretable models, and when should they be enforced?

Dates

Note: All deadlines are 11:59PM UTC-12:00 Anywhere on Earth (AoE).

Paper Submission

Submission open on OpenReview: August 9, 2024
Submission Deadline: August 30, 2024
Notification of Acceptance: October 9, 2024
Camera-ready Deadline: November 15, 2024

Workshop Event

Date: December 15, 2024

Accepted Papers

Oral Papers

Schedule

Time	Event	Additional Information
8:50 - 9:00 AM	Opening Remarks
9:00 - 9:30 AM	IT1: Cynthia Rudin	Title: The Marriage of Noise and Simplicity Noisier data generally leads to simpler machine learning models. My collaborators and I have been studying this phenomenon over many years, aiming to understand why tabular datasets (which often have inherent noise) often admit simple-yet-accurate models. Our latest work provides analytical relationships between noise and implicit sparsity regularization, showing that more noise leads to more regularization (and thus simpler models). These results are exciting because they help explain why adding more model complexity doesn’t always work, and why, in many cases, it simply cannot work. Policy makers can use this information to understand what levels of complexity are truly needed for certain types of high-stakes decisions. Zachery Boner, Lesia Semenova, Harry Chen*, Cynthia Rudin, Ronald Parr Using Noise to Infer Aspects of Simplicity Without Learning. NeurIPS, 2024. https://openreview.net/pdf?id=b172ac0R4L
9:30 - 10:00 AM	IT2: Rich Caruana	Title: The Unexpected Success of GlassBox Learning with Tabular Data Conventional wisdom was that models simple enough to be interpretable must be less accurate. That is, glassbox models must be less accurate than blackbox models. Surprisingly, this does not appear to be true for tabular data. Moreover, models that are simple enough to be interpretable have other advantages as well: despite their transparency they are more likely to protect privacy, they are easier to correct when they make mistakes, and they make bias easier to detect and mitigate. In this talk we'll review a number of unexpected advances in glassbox learning made over the last 30+ years.
10:00 - 11:15 AM	Poster Session 1 + Coffee Break
11:15 - 12:00 AM	Panel Discussion	Moderator: Kamalika Chaudhuri. Panelists: Cynthia Rudin, Rich Caruana, David Bau, and Victor Veitch.
12:00 - 1:00 PM	Lunch
1:00 - 1:30 PM	Contributed Talks 1	Paper ID: 56 - "Isometry pursuit" Paper ID: 37 - "GAMformer: Exploring In-Context Learning for Generalized Additive Models" Paper ID: 24 - "Exploiting Interpretable Capabilities with Concept-Enhanced Diffusion and Prototype Networks"
1:30 - 2:00 PM	IT3: Jiaxin Zhang	Title: Building AI-Native Customer Experiences with Confidence at Intuit In this talk, I present Intuit's approach to building AI-native customer experiences with a focus on interpretability and confidence. As organizations increasingly deploy AI systems at scale, ensuring reliable and trustworthy interactions becomes critical. I will discuss three key research directions that enable confident AI deployment: hallucination detection and mitigation, prompt optimization, and attribution, along with our proposed innovative work for addressing these challenges. These advances are already being implemented across Intuit's AI-powered products, serving millions of customers through platforms like TurboTax, QuickBooks, and Credit Karma. The talk will conclude with emerging research questions in building interpretable AI systems at scale.
2:00 - 2:30 PM	IT4: Tong Wang	Title: Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach Advanced Large language models (LLMs) like GPT-4 or LlaMa 3 provide superior performance in complex human-like interactions. But they are costly, or too large for edge devices such as smartphones and are hard to self-host, leading to security and privacy concerns. This talk introduces a novel interpretable knowledge distillation approach to enhance the performance of smaller, more economical LLMs that firms can self-host. We study this problem in the context of building a customer service agent aimed at achieving high customer satisfaction through goal-oriented dialogues. Unlike traditional knowledge distillation, where the "student" model learns directly from the "teacher" model's responses via fine-tuning, our interpretable "strategy" teaching approach involves the teacher providing strategies to improve the student's performance in various scenarios. This method alternates between a "scenario generation" step and a "strategies for improvement" step, creating a customized library of scenarios and optimized strategies for automated prompting. The method requires only black-box access to both student and teacher models; hence it can be used without manipulating model parameters. In our customer service application, the method improves performance, and the learned strategies are transferable to other LLMs and scenarios beyond the training set. The method's interpretability helps safeguard against potential harms through human audit.
2:30 - 3:00 PM	Coffee Break
3:00 - 3:30 PM	IT5: Neel Nanda	Title: Sparse Autoencoders: Assessing the evidence Sparse autoencoders are a technique for interpreting which concepts are represented in a model's activations, and have been a big focus of recent mechanistic interpretability work. In this talk, Neel will assess what we've learned about how well sparse autoencoders work over the past year, the biggest problems with them, and what he sees as next steps for the field.
3:30 - 4:00 PM	Contributed Talks 2	Paper ID: 65 - "Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations" Paper ID: 67 - "Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions" Paper ID: 55 - "Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models"
4:00 - 4:45 PM	Poster Session 2
4:45 - 5:00 PM	Concluding Remarks

All times are in Canada/Pacific Time

Speakers and panelists

Cynthia Rudin

Distinguished Professor
Duke University

Rich Caruana

Senior Principal Researcher
Microsoft Research

Tong Wang

Assistant Professor of Marketing
Yale University

Neel Nanda

Lead of
Google DeepMind Mechanistic Interpretability

Jiaxin Zhang

Staff Research Scientst
Intuit AI Research

David Bau

Assistant Professor
Northeastern University

Victor Veitch

Assistant Professor
University of Chicago

Organisers

Suraj Srinivas

Machine Learning Research Scientist at Bosch Research, his research focuses on interpretable and data-centric machine learning

Michal Moshkovitz

Machine Learning Research Scientist at Bosch Research, she has been focused on developing the foundations of explainable machine learning

Chhavi Yadav

PhD student at UCSD, her interests lie in XAI, Secure Verification, Auditing and societal impacts of deep generative models

Lesia Semenova

Postdoctoral researcher at Microsoft Research, her research focuses mainly on interpretable machine learning and AI in healthcare.

Nave Frost

Research Scientist at eBay Research, his research interests focus on supplying explanations for data science applications

Valentyn Boreiko

PhD student at the University of Tübingen, his research focuses on development of interpretability technique for vision classifiers

Vinayak Abrol

Assistant Professor at IIIT Delhi, his research focuses on the design and analysis of numerical algorithms for information-inspired applications

Bitya Neuhof

PhD student at the Hebrew University of Jerusalem, exploring the stability and reliability of explainable AI methods

Dotan Di Castro

Research scientist and lab manager at Bosch Research, his research focuses on Reinforcement Learning and Computer Vision

Kamalika Chaudhuri

Associate Professor at UCSD and a Research Scientist at Meta AI, her research interests lie in the foundations of trustworthy machine learning

Hima Lakkaraju

Assistant Professor at Harvard University who focuses on the algorithmic and applied aspects of explainability, fairness, robustness, and privacy of machine learning models

Contact information

Email: interpretable.ai.neurips.workshop [AT] gmail.com

Interpretable AI: Past, Present and Future