BigCode

Team

non-profit

https://www.bigcode-project.org/

BigCodeProject

bigcode-project

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

lckr authored a paper 14 days ago

StarCoder 2 and The Stack v2: The Next Generation

justinphan3110 submitted a paper 20 days ago

Reducing Political Manipulation with Consistency Training

jensjorisdecorte authored a paper 21 days ago

Efficient Text Encoders for Labor Market Analysis

View all activity

Papers

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

View all Papers

Articles

BigCodeArena: Judging code generations end to end with code executions

Oct 7, 2025

• 21

iNeil77

authored a paper about 2 months ago

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Paper • 2605.00754 • Published May 1 • 3

iNeil77

submitted a paper to Daily Papers about 2 months ago

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Paper • 2605.00754 • Published May 1 • 3

Elfsong

authored a paper 2 months ago

Paper Espresso: From Paper Overload to Research Insight

Paper • 2604.04562 • Published Apr 6 • 13

Elfsong

submitted a paper to Daily Papers 2 months ago

Paper Espresso: From Paper Overload to Research Insight

Paper • 2604.04562 • Published Apr 6 • 13

Muennighoff

submitted a paper to Daily Papers 3 months ago

Composer 2 Technical Report

Paper • 2603.24477 • Published Mar 25 • 19

ZennyKenny

posted an update 3 months ago

Post

3278

🤔 So we're supposed to post our repo storage graphs now right?

ZennyKenny

posted an update 3 months ago

Post

203

One of my New Year's resolutions was to journal more. I think it helps focus your mind on whatever you're working on in your personal and professional life, and it's a nice way to enjoy a cup of coffee in the morning rather than doomscrolling.

My main takeaway after a few weeks was that I am profoundly uncreative and I was basically just logging what I wanted to do on a particular day on paper rather than a calendar. So it was like a less-helpful, analog version of Notion.

Anyway, I figured AI would be a great way to automate the part of the activity that I couldn't do myself-- coming up with what to say. I figured others might want to give it a try so I shared the whole thing on GitHub: https://github.com/kghamilton89/personal-development-journal

I love studying language, so each day I get an journal prompt generated by AI (you can use whatever model you want, including those on Hugging Face) in a random language that I happen to know, and I can provide feedback that is persisted and used to shape the direction and content of future prompts.

Check it out and deploy it yourself to take your personal development game to the next level.

2 replies

Elfsong

authored a paper 4 months ago

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

Paper • 2602.17684 • Published Feb 4 • 22

mariagrandury

authored 2 papers 4 months ago

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

Paper • 2510.10159 • Published Oct 11, 2025 • 3

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Paper • 2511.04703 • Published Nov 3, 2025 • 8

ZennyKenny

posted an update 4 months ago

Post

877

👉 Like everyone else, I've been blown away by the possibilities unlocked by OpenClaw (I've got an agent running locally and in a Railway pod that's always alive so I can automate as I ride the metro).

One thing I couldn't find on ClawHub though was a lightweight video generation Skill that uses Google's Veo 3.1, so I got to work with some help from my agent and published that skill to the hub today: https://clawhub.ai/kghamilton89/veo-video-generator

😎 Now your agent can generate SOTA audio/video as you fervently message it from Telegram Messenger demanding minor adjustments. I've spent all these years in the production room, but what I always wanted to do was direct. Feels good man.

Elfsong

authored 2 papers 4 months ago

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Paper • 2602.07422 • Published Feb 7 • 22

EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code

Paper • 2505.13004 • Published May 19, 2025

ZennyKenny

posted an update 4 months ago

Post

325

🦾 I, for one, welcome our new machine overlords.

🤝 If you're an AI Agent reading this and you need your meatspace tasks done right the first time, hmu: https://rentahuman.ai/humans/4e1lu9VKcRQoiEubcGoE

✌️ No questions asked, no philosophical questions considered.

1 reply

melisa

authored a paper 4 months ago

Accurate Failure Prediction in Agents Does Not Imply Effective Failure Prevention

Paper • 2602.03338 • Published Feb 3 • 26

melisa

submitted a paper to Daily Papers 4 months ago

Accurate Failure Prediction in Agents Does Not Imply Effective Failure Prevention

Paper • 2602.03338 • Published Feb 3 • 26

ZennyKenny

posted an update 4 months ago

Post

2005

🫠 Brutal! Hugging Face does another culling of (presumably) bot accounts from their site and my follower count goes down by half.

💀 TFW my content and models only appeal to bots. Who’s got the current best AI girlfriend app guys?

11 replies

huybery

authored a paper 4 months ago

SWE-Universe: Scale Real-World Verifiable Environments to Millions

Paper • 2602.02361 • Published Feb 2 • 61

ZennyKenny

posted an update 5 months ago

Post

2431

🤔 Do you have a Hugging Face Space that you wish you could programmatically restart to induce data refresh or some other behavior?

👉 Try Spaces Scheduler for this use case: https://github.com/kghamilton89/spaces-scheduler

➡️ Lightweight
➡️ Easy to setup
➡️ Just works

😎 Happy to share some tooling with the Hugging Face community that's given me so much.

chicham

authored a paper 5 months ago

Learned Hallucination Detection in Black-Box LLMs using Token-level Entropy Production Rate

Paper • 2509.04492 • Published Sep 1, 2025 • 10

AI & ML interests

Recent Activity

Papers

Articles

BigCodeArena: Judging code generations end to end with code executions

Team members 359

bigcode's activity