My current research interests roughly lie on the intersection of interpretable/explainable AI, representation learning, and human-in-the-loop AI. More specifically, I am interested in (1) the design of powerful models that can construct explanations for their predictions in terms of high-level “concepts” and (2) the broad applications that these architectures may have in scenarios where experts can interact with the models at test time (e.g., model steering, test-time feedback, concept interventions).
Below you can find a list of some of my publications, including their respective venues, papers, code, and presentations (when applicable). For a possibly more up-to-date list, however, please refer to my Google Scholar profile.
Publication Browser
Search and filter publications by title, author, year, venue, publication type, and tags.
Conference Publications
International Conference on Learning Representations (ICLR), 2026.
Concept-based Interpretability
Neural Information Processing Systems (NeurIPS), 2025
Learning to Defer
International Conference On Machine Learning (ICML), 2025.
Concept Interventions
International Conference On Machine Learning (ICML), 2025.
Concept Bottleneck Models
International Conference on Learning Representations (ICLR), 2025.
Causality
Oral and Best Paper Candidate (within top 15 papers out of 8,500+ submissions) at the 18th European Conference on Computer Vision (ECCV), 2024
Fairness & Bias
International Conference On Machine Learning (ICML), 2024.
Limitations of Concept-Based Models
Spotlight paper at the conference on Neural Information Processing Systems (NeurIPS), 2023
Concept Interventions
International Conference On Machine Learning (ICML), 2023. Also appeared at ICML's Differentiable Almost Everything Workshop, 2023.
Neural-Symbolic AI
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023
Human-AI Collaboration
Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence (AAAI) as an oral presentation, 2023
Evaluation Metrics
Conference on Neural Information Processing Systems (NeurIPS), 2022
Concept-based Interpretability
The 24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2019
Systems Benchmarking
Journal Publications
Transactions of Machine Learning Research (TMLR), 2025. Also appeared at "XAI in Action: Past, Present, and Future Applications" in NeurIPS 2023.
Limitations of Concept Bottleneck Models
Transactions of Machine Learning Research (TMLR), 2023. Also appeared at ICML's Workshop on Interpretable Machine Learning in Healthcare, 2023.
Tabular Explainability
IEEE Micro, 2020
Systems Benchmarking
Workshop Publications
ICLR 2026 Workshop on Principled Design for Trustworthy AI
Hierarchical Concept Models
1st NeurIPS Workshop on eXplainable AI approaches for debugging and diagnosis (XAI4Debugging@NeurIPS) as a spotlight presentation, 2021
Rule Extraction
Preprints
Preprint, 2026.
Concept-Based Models
Preprint, 2026.
Interpretability Theory
Preprint, 2025.
Interpretability Theory
Preprint, 2025.
Interpretability Theory
Theses
University of Cambridge
PhD Thesis