Nathan Habib's picture

Building on HF

Nathan Habib PRO

SaylorTwift

huggingface

·

AI & ML interests

Evals

Recent Activity

new activity 2 days ago

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4:Add evaluation results (GPQA, MMLU-Pro, SWE-bench Verified, HLE)

new activity 2 days ago

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16:Add evaluation results (GPQA, MMLU-Pro, SWE-bench Verified, HLE)

liked a model 2 days ago

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4

View all activity

Organizations

buckets 2

SaylorTwift/deep-swe

SaylorTwift/reposcan

Posts 1

Post

2426

How do I test an LLM for my unique needs?
If you work in finance, law, or medicine, generic benchmarks are not enough.
This blog post uses Argilla, Distilllabel and 🌤️Lighteval to generate evaluation dataset and evaluate models.

https://github.com/argilla-io/argilla-cookbook/blob/main/domain-eval/README.md

Articles 12

Article

8

Stop benchmarking inference providers

View all Articles

Collections 8

View 8 collections

Papers 1

arxiv:2310.16944

spaces 24

Reposcan

Search and explore repo issues, PRs, and code

Qwen3 8B

Inspect and browse server log files

Qwen2.5 0.5B Instruct Evals

Inspect and view log files in a web interface

Meta Llama 3.1 8b Cb

Inspect and explore log files in a web view

Transformers CB

View and explore server logs in a web interface

Leaderboard Dashboard

models 4

SaylorTwift/SmolLM3-3B

Text Generation • 3B • Updated Jan 6 • 7

SaylorTwift/test

Updated Dec 16, 2025

SaylorTwift/gpt2_test

Text Generation • 0.1B • Updated Sep 23, 2024 • 856

SaylorTwift/xlm-roberta-base-finetuned-panx-fr

Updated Mar 13, 2023

datasets 56

SaylorTwift/gemma4-blog-images

Viewer • Updated Apr 2 • 1 • 7

SaylorTwift/mteb-bitext-mining-aggregated

Viewer • Updated Apr 2 • 588k • 6.77k

SaylorTwift/gsm8k-cb-llama31-8b-results

Viewer • Updated Mar 30 • 100 • 18

SaylorTwift/gsm8k-cb-results

Viewer • Updated Mar 30 • 100 • 18

SaylorTwift/aime-2026-qwen35-results

Viewer • Updated Mar 6 • 1 • 10

SaylorTwift/aime-2026-qwen25-72b-results

Viewer • Updated Mar 6 • 1 • 16

SaylorTwift/aime-2026-vllm-results

Viewer • Updated Mar 6 • 1 • 14

SaylorTwift/claude-sonnet-4-0

Updated Dec 15, 2025 • 54

SaylorTwift/aime25

Viewer • Updated Nov 24, 2025 • 30 • 28

SaylorTwift/lighteval-tasks-database

Viewer • Updated Sep 25, 2025 • 1.36k • 52

View 56 datasets