HRM Prompt Injection Detector

Parameters: 26,493,954
Architecture: HRM (arXiv:2506.21734) | d_emb=512, d_state=2048, N=3, T=4

Trained on merged 5-dataset prompt injection corpus with stratified 90/10 split.

Evaluation

Metric Value
Accuracy 0.8849
Precision 0.8513
Recall 0.8741
F1 0.8625

Usage

import torch
from train_hrm_pi import HRMClassifier, ByteTokenizer

model = HRMClassifier(d_emb=512, d_state=2048)
model.load_state_dict(torch.load("hrm_model.pt", map_location="cpu"))
model.eval()

tokenizer = ByteTokenizer(max_length=256)
tokens = tokenizer(["Your prompt here"])
logits = model.inference(tokens["input_ids"], tokens["attention_mask"])
pred = logits.argmax(-1).item()  # 0=safe, 1=injection
Downloads last month
36
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for av-codes/prompt-injection-hrm