About Me

I am a PhD student in Electronics and Computer Engineering at the University of Cagliari. My research focuses on adversarial machine learning and the security of large language models, with particular attention to jailbreak robustness, mechanistic interpretability, and practical attack and defense evaluation. I am interested in understanding how and why models fail, and in designing methods that improve their reliability in real-world settings.

News

May 2026

Best Poster Award at ELSA

Our paper “SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models” received the Best Poster Award at the Final ELSA General Assembly 2026.

February 2026

ELLIS Institute Tübingen

Starting next month I will be joining the ELLIS Institute Tübingen, working on AI Safety and Alignment under the supervision of Maksym Andriushchenko.

February 2026

Slides available: “From Evasion to Jailbreak”

The slides are available for the tutorial “From Evasion to Jailbreak: Adversarial Machine Learning in the age of LLMs”, held with Fabio Brau at TAIC - ITASEC2026. View the talk page.

January 2026

AAAI 2026 in Singapore 🇸🇬

Together with Giorgio Piras, we are presenting “SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models”.

January 22, 12 pm, Hall 2 — Poster #62. If you are around, come by for a chat on LLMs’ refusal.

December 2025

NeurIPS 2025 in San Diego 🇺🇸

Together with Fabio Brau, we are presenting “TransferBench: Benchmarking Ensemble-based Black-box Transfer Attacks” at NeurIPS 2025 and EurIPS 2025.

December 4, 2025, 11:00 AM-2:00 PM PST, Poster Stand #3914.

Raffaele Mura