About Me
Hi! I am a Ph.D. student in Electrical and Computer Engineering at Virginia Tech, advised by Prof. Ruoxi Jia in the Responsible Data Science Lab. I previously completed my M.S. at Virginia Tech and my B.Tech. at the Indian Institute of Information Technology, Una.
My research focuses on AI Safety & Alignment — red-teaming and jailbreak robustness, AI agent safety, and data-efficient training regimes. I am broadly interested in understanding and mitigating failure modes in large language models and agentic systems.
Feel free to reach out at mahavirdabas18@vt.edu.
News
- Jun 2026 Our paper on memory-induced tool-drift in LLM agents was accepted to the ICML 2026 Agents in the Wild: Safety, Security, and Beyond Workshop. [Paper]
- Jan 2026 Adversarial Déjà Vu was accepted to ICLR 2026! [Website]
- May 2025 Defended my Master's thesis and am continuing on to my Ph.D. at Virginia Tech.
- May 2025 Our research on efficient over-refusal mitigation in aligned LLMs was accepted to ICML 2025. [Paper]
- Oct 2024 Team HokieTokie was selected as one of the top 10 teams for the inaugural Amazon Nova AI Challenge. [Link]
- Aug 2023 Started my Master's at Virginia Tech.
Publications


Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
ICLR 2026

Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning
ICML 2025


Academic Service
Reviewer: ICML 2025 · NeurIPS 2025 · EMNLP 2025 · ICLR 2026 · ICML 2026