Value-Aware Stochastic KV Cache Eviction for Reasoning Models Paper • 2606.03928 • Published 15 days ago • 8
Value-Aware Stochastic KV Cache Eviction for Reasoning Models Paper • 2606.03928 • Published 15 days ago • 8
deqing/convergent-llama-300M-muon-6digit-addition_6digit_llama Text Generation • 0.3B • Updated 14 days ago • 618 • 1
deqing/convergent-llama-300M-muon-6digit-addition_6digit_custom3 Text Generation • 0.2B • Updated 16 days ago • 1.6k • 1
deqing/convergent-llama-300M-muon-base15-addition_base15 Text Generation • 0.2B • Updated 18 days ago • 878
deqing/convergent-llama-300M-muon-6digit-addition_6digit_llama Text Generation • 0.3B • Updated 14 days ago • 618 • 1
deqing/convergent-llama-300M-muon-base12-addition_base12 Text Generation • 0.2B • Updated 18 days ago • 947
deqing/convergent-llama-300M-muon-4digit-addition_4digit_custom3 0.2B • Updated 19 days ago • 860 • 1
deqing/convergent-llama-300M-muon-4digit-addition_4digit_custom3_right2left 0.2B • Updated 19 days ago • 124
deqing/convergent-llama-300M-muon-4digit-addition_4digit_custom3_right2left 0.2B • Updated 19 days ago • 124
deqing/convergent-llama-300M-muon-base15-addition_base15 Text Generation • 0.2B • Updated 18 days ago • 878
deqing/convergent-llama-300M-muon-base12-addition_base12 Text Generation • 0.2B • Updated 18 days ago • 947