arxiv:2605.12357

δ-mem: Efficient Online Memory for Large Language Models

Published on May 12

· Submitted by

taesiri on May 13

Authors:

Jingdi Lei ,

Di Zhang ,

Junxian Li ,

Weida Wang ,

Xiang Liu ,

Baian Chen ,

Abstract

A lightweight memory mechanism called δ-mem enhances large language models by augmenting a frozen attention backbone with a compact associative memory state that provides low-rank corrections to attention computations.

AI-generated summary

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose δ-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. δ-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation. With only an 8times8 online memory state, δ-mem improves the average score to 1.10times that of the frozen backbone and 1.15times that of the strongest non-δ-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching 1.31times on MemoryAgentBench and 1.20times on LoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension.

View arXiv page View PDF GitHub 137 Add to collection

Community

Dominic789654

Paper author 9 days ago

https://github.com/declare-lab/delta-Mem

urroxyz

8 days ago

•

edited 8 days ago

This is really, really cool.

I think it's important we stop treating weights as the ultimate beholder, and start adding ornaments with their own purposes. This paper introduces one of the most lightweight and successful implementations of tack-on memory that I've seen yet.

Keep it up!

huaXiaKyrie

Paper author 8 days ago

I am really appreciate you like it

avahal

about 17 hours ago

the delta-rule online memory idea, packing an 8x8 state into the attention loop, is the neat trick here. my one question: how does delta-mem cope when the conversation topic shifts abruptly or the history becomes highly non-stationary? the arxivlens breakdown helped me parse the method details, especially the readout path and the memory write updates, which makes the mechanism feel a lot more interpretable; you can check their walkthrough here: https://arxivlens.com/PaperView/Details/d-mem-efficient-online-memory-for-large-language-models-512-a95169dd
overall i appreciate that this stays frozen-backbone friendly while still delivering meaningful gains on memory-heavy benchmarks.