arxiv:2605.31170
Federico Torrielli
EvilScript
AI & ML interests
AI Safety & Mechanistic interpretability
Recent Activity
upvoted a paper about 6 hours ago
PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models upvoted a paper about 6 hours ago
BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling