Dev Mode Explorers

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

victor submitted a paper 8 days ago

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

nielsr submitted a paper 11 days ago

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

nielsr submitted a paper 16 days ago

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

View all activity

GeorgeBredis

authored a paper 1 day ago

Next Embedding Prediction Makes World Models Stronger

Paper • 2603.02765 • Published 3 days ago • 16

GeorgeBredis

submitted a paper to Daily Papers 2 days ago

Next Embedding Prediction Makes World Models Stronger

Paper • 2603.02765 • Published 3 days ago • 16

victor

submitted a paper to Daily Papers 8 days ago

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

Paper • 2602.21548 • Published 9 days ago • 37

nielsr

submitted a paper to Daily Papers 11 days ago

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

Paper • 2602.17807 • Published 15 days ago • 6

Tonic

posted an update 13 days ago

Post

3116

🤔 Who would win ?

- a fully subsidized ai lab
OR
- 3 random students named

kurakurai ?

demo : Tonic/fr-on-device

if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .

4 replies

mariagrandury

authored 2 papers 15 days ago

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

Paper • 2510.10159 • Published Oct 11, 2025 • 3

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Paper • 2511.04703 • Published Nov 3, 2025 • 8

nielsr

submitted a paper to Daily Papers 16 days ago

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

Paper • 2602.11389 • Published 23 days ago • 5

Tonic

posted an update 17 days ago

Post

3204

🙋🏻‍♂️hello my lovelies ,

it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.

repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw

you can also run it locally and see for yourself :

docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest

just a few quite minor details i'll take care of but i wanted to share here first

2 replies

julien-c

submitted a paper to Daily Papers about 1 month ago

Shaping capabilities with token-level data filtering

Paper • 2601.21571 • Published Jan 29 • 27

victor

posted an update about 1 month ago

Post

1102

Interesting article: use Claude Code to help open models write CUDA kernels (for eg) by turning CC traces into Skills. They made a library out of it 👀

https://huggingface.co/blog/upskill

nielsr

submitted a paper to Daily Papers about 1 month ago

UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

Paper • 2601.17950 • Published Jan 25 • 4

nielsr

submitted a paper to Daily Papers about 2 months ago

TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration

Paper • 2601.04544 • Published Jan 8 • 6

nielsr

submitted a paper to Daily Papers 2 months ago

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published Dec 22, 2025 • 12

victor

posted an update 3 months ago

Post

3438

Nvidia is on a roll lately. Nemotron 3 Nano is my new fav local model, but here's the real flex: they published the entire evaluation setup. Configs, prompts, logs, all of it. This is how you do open models 🔥

https://huggingface.co/blog/nvidia/nemotron-3-nano-evaluation-recipe

KingNish

posted an update 3 months ago

Post

3127

Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1

KingNish

posted an update 3 months ago

Post

2692

I tested Muon vs MuonClip vs Muon+AdamW for fine-tuning LLMs
Just published a blog on that, Read here 👉 https://huggingface.co/blog/KingNish/optimizer-part1

1 reply

mrfakename

posted an update 3 months ago

Post

16997

Excited to share that I've joined the Hugging Face Fellows program! 🤗

Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! 🚀

reach-vb

authored a paper 3 months ago

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

Paper • 2510.06961 • Published Oct 8, 2025 • 11

nroggendorff

posted an update 3 months ago

Post

2994

I am now being charged for paused and unstarted spaces out of the blue.
UPDATE: The problem seems to be resolved, but I won't be able to make any new models or datasets, or test any training scripts for the foreseeable future.

The unstarted spaces I can get behind. I would've appreciated a warning email first, but whatever. However, every time I restart the active usage goes up, despite all of my spaces being moved to CPU (free), and being paused.

13 replies

AI & ML interests

Recent Activity

Team members 145

dev-mode-explorers's activity