Federico Torrielli's picture

Federico Torrielli

EvilScript

·

https://federicotorrielli.github.io

AI & ML interests

AI Safety & Mechanistic interpretability

Recent Activity

authored a paper about 21 hours ago

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

upvoted a paper 2 days ago

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

upvoted a paper 2 days ago

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

View all activity

Organizations

upvoted 2 papers 2 days ago

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

Paper • 2606.09697 • Published 4 days ago • 5

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

Paper • 2606.09707 • Published 4 days ago • 6

upvoted a paper 7 days ago

LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs

Paper • 2606.06286 • Published 8 days ago • 8

upvoted a paper 11 days ago

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

Paper • 2605.31170 • Published 14 days ago • 12

upvoted a paper 14 days ago

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

Paper • 2605.07462 • Published May 8 • 3

upvoted 2 papers 16 days ago

Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models

Paper • 2605.11887 • Published May 12 • 13

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

Paper • 2605.26045 • Published 18 days ago • 12

upvoted a collection about 2 months ago

Activation Oracles

12 items • Updated Dec 26, 2025 • 18

upvoted a collection 9 months ago

PP-OCRv5

PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated about 1 month ago • 57

upvoted a collection about 1 year ago

Gemma 3 Release

28 items • Updated Mar 12 • 643