About Me

Hi! I am a Ph.D. student in Electrical and Computer Engineering at Virginia Tech, advised by Prof. Ruoxi Jia in the Responsible Data Science Lab. I previously completed my M.S. at Virginia Tech and my B.Tech. at the Indian Institute of Information Technology, Una.

My research focuses on AI Safety & Alignment — red-teaming and jailbreak robustness, AI agent safety, and data-efficient training regimes. I am broadly interested in understanding and mitigating failure modes in large language models and agentic systems.

Feel free to reach out at mahavirdabas18@vt.edu.

News

  • Jun 2026 Our paper on memory-induced tool-drift in LLM agents was accepted to the ICML 2026 Agents in the Wild: Safety, Security, and Beyond Workshop. [Paper]
  • Jan 2026 Adversarial Déjà Vu was accepted to ICLR 2026! [Website]
  • May 2025 Defended my Master's thesis and am continuing on to my Ph.D. at Virginia Tech.
  • May 2025 Our research on efficient over-refusal mitigation in aligned LLMs was accepted to ICML 2025. [Paper]
  • Oct 2024 Team HokieTokie was selected as one of the top 10 teams for the inaugural Amazon Nova AI Challenge. [Link]
  • Aug 2023 Started my Master's at Virginia Tech.

Publications

Memory-Induced Tool-Drift in LLM Agents

Mahavir Dabas, J. Jeong, M. Jin, R. Jia

ICML 2026 AIWILD Workshop

Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks

Mahavir Dabas, T. Huynh, N. R. Billa, J. T. Wang, P. Gao, C. Peris, Y. Ma, R. Gupta, M. Jin, P. Mittal, R. Jia

ICLR 2026

Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning

Mahavir Dabas, S. Chen, C. Fleming, M. Jin, R. Jia

ICML 2025

Characterizing Model-Native Skills

F. Kang, Mahavir Dabas, M. Ko, R. Jia

Preprint, Apr 2026

Can Generalist Agents Automate Data Curation?

F. Kang, H. Li, A. Nguyen, Mahavir Dabas, J. W. Ma, F. Sala, D. Song, R. Jia

Preprint, Jun 2026

Academic Service

Reviewer: ICML 2025 · NeurIPS 2025 · EMNLP 2025 · ICLR 2026 · ICML 2026