Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

projectlosangelesย 
posted an update 1 day ago
view post
Post
9410
๐Ÿ”ฅCheck out first-of-its-kind SOTA Orpheus Morpheus preview!๐Ÿ”ฅ

projectlosangeles/Orpheus-Morpheus

Easily generate variations or similar compositions from any MIDI!

Please โค๏ธif you enjoyed Orpheus Morpheus!

Sincerely,

Alex

qgallouedecย 
posted an update 2 days ago
view post
Post
6747

TRL v1.3 ships day-one training support for Qwen 3.6 ๐Ÿš€

The new Qwen 3.6 family (Qwen/Qwen3.6-27B, Qwen/Qwen3.6-35B-A3B) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with {% generation %} markers, tool-call response schema routing, tiny test models for the VLM matrix.

SFT with assistant-only loss works out of the box:

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen3.6-27B",
    args=SFTConfig(assistant_only_loss=True),
    train_dataset=dataset,
)
trainer.train()


So does GRPO tool-calling โ€” just hand tools=[...] to GRPOTrainer.

v1.3 also brings a new experimental TPO trainer (Triple Preference Optimization), speculative decoding in trl vllm-serve (Qwen3 MTP / Eagle3 drafts), 12 more KTO โ†” DPO alignment PRs (KTO promotion to stable is now in reach), three more {% generation %} chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.3.0
Enderchefย 
posted an update 1 day ago
view post
Post
4251
Hi, everyone!
Please follow, like, and support the work of
CompactAI-O
!
Spread the word!
SeaWolf-AIย 
posted an update 4 days ago
view post
Post
8603
๐Ÿงฌ Introducing Darwin-9B-NEG โ€” the first model with Native Entropy Gating (NEG)

๐Ÿ”— Try it now: FINAL-Bench/Darwin-9B-NEG
๐Ÿ”— Q4 bit : FINAL-Bench/Darwin-9B-MFP4

We're thrilled to release Darwin-9B-NEG, a 9B-parameter reasoning model
that embeds an architecturally-internalised sense of self-confidence directly
into the transformer โ€” our proprietary Native Entropy Gating (NEG) technology.

๐Ÿ“Š GPQA Diamond (198 PhD-level questions):

โ–ธ Baseline Darwin-9B (no NEG) โ†’ 51.01 %
โ–ธ Pure NEG (greedy ยท 1ร— cost) โ†’ 63.64 % ๐Ÿ”ฅ +12.63 %p
โ–ธ + Permutation (4ร— cost) โ†’ 76.26 %
โ–ธ + Ensemble Refinement (~20ร—) โ†’ 84.34 % ๐Ÿ†

With only 9 billion parameters and 1ร— inference cost, Pure NEG jumps
+12.63 %p over the same model without NEG. Going all-in with ensemble
refinement pushes it to 84.34 % โ€” surpassing the published Qwen3.5-9B
leaderboard score (81.7 %) by +2.64 %p.

๐Ÿ”ฌ What makes NEG different from Multi-Turn Iteration (MTI)?

Classical MTI needs 3-8ร— extra inference passes. NEG instead lives
INSIDE the single decoding loop. Two tiny modules ride with the
transformer: NEG-Head predicts per-token entropy from the last hidden
state, and NEG-Gate conditionally restricts the top-k choice when
confidence is low. The gate activates in only 4.36 % of tokens โ€”
essentially free at inference time.

โœจ Key differentiators
โ€ข Architecturally internalised โ€” model file *is* the feature
โ€ข 1ร— inference cost (vs. 3-8ร— for MTI)
โ€ข Drop-in with vLLM / SGLang / TGI / transformers โ€” no extra engine
โ€ข +12.63 %p reasoning at zero latency overhead
โ€ข Single-file deployment, Apache 2.0 licensed

๐Ÿงฌ Lineage
Qwen/Qwen3.5-9B โ†’ Darwin-9B-Opus (V7 evolutionary merge) โ†’ Darwin-9B-NEG (V8 + NEG training)

#Darwin #NEG #NativeEntropyGating #GPQA #Reasoning #LLM #OpenSource #Apache2
mlabonneย 
posted an update about 23 hours ago
view post
Post
206
Big update to llm-datasets, my curated list of datasets and tools for post-training LLMs.

> Added many new datasets
> New "thinking" column
> Refreshed recommended tools.

Thanks to everyone who told me they used it for their research at ICLR, you motivated this update!
  • 1 reply
ยท
kanaria007ย 
posted an update 1 day ago
view post
Post
123
โœ… Article highlight: *Continuous Audit Pipeline: Making Evidence Bundles Routine* (art-60-107, v0.1)

TL;DR:
This article argues that evidence bundles should not be an incident-only ritual.

If reconstructability matters only after something goes wrong, it is already too late. SI turns audit into a *continuous pipeline*: routine sealed bundles, immediate verification, retention-safe omissions, and automatic escalation when governance SLOs are breached.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
โ€ข makes โ€œcourtroom-grade reconstructabilityโ€ a routine byproduct of normal ops
โ€ข turns governance SLO breaches into explicit state transitions, not dashboard trivia
โ€ข separates stable audit spine from payload store, so erasure removes access without destroying proof
โ€ข prevents incident-time improvisation from breaking determinism, chain-of-custody, or export integrity

Whatโ€™s inside:
โ€ข the operating model: *Audit Spine vs Payload Store*
โ€ข three routine bundle tiers: daily governance bundles, weekly compliance bundles, and triggered incident-ready bundles
โ€ข trigger rules where CAS / ACR / RBL / EOH breaches automatically emit bundles and degrade governance state
โ€ข an end-to-end pipeline: collect โ†’ shape/omit โ†’ canonicalize โ†’ digest โ†’ resolve refs โ†’ seal โ†’ sign โ†’ verify โ†’ retain
โ€ข a governed run record for continuous audit itself, including policy, trust, canonicalization, reason-code-set, and registry snapshot bindings

Key idea:
Do not wait until an incident to โ€œprepare evidence.โ€

Make evidence production continuous, sealed, and self-verifyingโ€”so when something breaks, you select the window instead of inventing the proof.

*Continuous audit is not paperwork. It is a control loop on admissibility and autonomy.*
branikitaย 
posted an update 1 day ago
akhiilllย 
posted an update 2 days ago
view post
Post
143
Just shipped ClaimSense Adjudication Gym at OpenEnv Hackathon 2026 (Scaler India).

An OpenEnv RL environment for enterprise insurance claims adjudicationโ€”the monthly โ€œtool-heavyโ€ workflow real adjusters do: pull policy + claim history, run fraud checks, verify purchase/transactions, then approve / deny / escalate under partial observability with long-horizon credit assignment.

Trained Qwen/Qwen2.5-1.5B-Instruct with:

Rollout evaluation on HF Jobs (A10G) and a random baseline for comparison
Real GRPO weight updates (TRL GRPOTrainer) with LoRA adapters and two independent reward functions (format + env replay)
Headline training evidence:

GRPO run: 80 steps, 640 rollouts, KL rises ~0 โ†’ ~0.06 (real weight updates), completion length shrinks (~25 โ†’ ~10).
Plots + logs are committed in the Space under runs/.
Live demo + repo + writeup linked below.

๐Ÿ”— Env (Space URL): akhiilll/claims-env
๐Ÿงช Notebook: akhiilll/claims-env
๐Ÿ“ Blog: docs/HF_MINI_BLOG.md in the Space
AkimfromParisย 
posted an update 5 days ago
view post
Post
2466
๐ŸŒธ ๐™Š๐™ฅ๐™š๐™ฃ ๐™…๐™–๐™ฅ๐™–๐™ฃ๐™š๐™จ๐™š ๐™‡๐™‡๐™ˆ ๐™‡๐™š๐™–๐™™๐™š๐™ง๐™—๐™ค๐™–๐™ง๐™™ ๐™‘2 ๐™ค๐™ฃ ๐™ƒ๐™ช๐™œ๐™œ๐™ž๐™ฃ๐™œ ๐™๐™–๐™˜๐™š ๐Ÿ‡ฏ๐Ÿ‡ต // ๐ŸŒธ ใƒใ‚ฎใƒณใ‚ฐใƒ•ใ‚งใ‚คใ‚น็‰ˆใ€Œ ๐—ข๐—ฝ๐—ฒ๐—ป ๐—๐—ฎ๐—ฝ๐—ฎ๐—ป๐—ฒ๐˜€๐—ฒ ๐—Ÿ๐—Ÿ๐—  ๐—Ÿ๐—ฒ๐—ฎ๐—ฑ๐—ฒ๐—ฟ๐—ฏ๐—ผ๐—ฎ๐—ฟ๐—ฑ ๐—ฉ๐Ÿฎ ใ€ๅ…ฌ้–‹ ๐Ÿ‡ฏ๐Ÿ‡ต

I am thrilled to announce the launch of version 2 of the ๐™Š๐™ฅ๐™š๐™ฃ ๐™…๐™–๐™ฅ๐™–๐™ฃ๐™š๐™จ๐™š ๐™‡๐™‡๐™ˆ ๐™‡๐™š๐™–๐™™๐™š๐™ง๐™—๐™ค๐™–๐™ง๐™™. This initiative is driven by the "Fine-tuning and Evaluation" team, led by Professor Miyao at the The University of Tokyo, under the Research and Development Center for Large Language Models (LLMC) at Japanโ€™s National Institute of Informatics (NII).

๐™Ž๐™ฉ๐™ง๐™–๐™ฉ๐™š๐™œ๐™ž๐™˜ ๐™–๐™ฃ๐™™ ๐™ฉ๐™š๐™˜๐™๐™ฃ๐™ž๐™˜๐™–๐™ก ๐™ช๐™ฅ๐™œ๐™ง๐™–๐™™๐™š๐™จ:
- Our new backend features eight A100 GPUs, enabling the evaluation of open-source models of more than 100B parameters.
- Submissions now require a Hugging Face Hub login to ensure accountability.
- We have added metrics for evaluation time, COโ‚‚ emissions (thx to Code Carbon ๐ŸŒฑ ), alongside reasoning capabilities.

๐˜ฟ๐™–๐™ฉ๐™–๐™จ๐™š๐™ฉ๐™จ ๐™–๐™ฃ๐™™ ๐™š๐™ซ๐™–๐™ก๐™ช๐™–๐™ฉ๐™ž๐™ค๐™ฃ ๐™จ๐™ฉ๐™–๐™ฃ๐™™๐™–๐™ง๐™™๐™จ:
- New datasets cover reasoning, mathematics, exams, and instruction following.
- Math evaluations now span from grade-school levels to expert-tier challenges (GSM8K, PolyMath, AIME).
- While integrating English-heavy and multilingual benchmarks (including Humanityโ€™s Last Exam, GPQA, and BBH in both English and Japanese), we continue to prioritize unique Japanese cultural datasets.

llm-jp/open-japanese-llm-leaderboard-v2

ใฉใ†ใžใŠ้ก˜ใ„่‡ดใ—ใพใ™๏ผ๐Ÿ˜Š
yuriyvnvย 
posted an update about 17 hours ago
view post
Post
99
๐Ÿ”Š Four Qwen3-ASR (0.6B and 1.7B) Fine-Tunes for Portuguese and Dutch.

Both the 1.7B and 0.6B variants of Alibaba's Qwen3-ASR, fine-tuned for European Portuguese and Dutch and bundled in a single collection.

๐Ÿ”— Collection: https://huggingface.co/collections/yuriyvnv/qwen-asr-for-portuguese-and-dutch-17b-and-06b

Headline numbers โ€” Common Voice 22 test, with the zero-shot baseline.
๐Ÿ‡ต๐Ÿ‡น Qwen3-ASR-1.7B-PT โ€” 12.91% โ†’ 8.50% WER (-34%)
๐Ÿ‡ต๐Ÿ‡น Qwen3-ASR-0.6B-PT โ€” 18.26% โ†’ 11.85% WER (-35%)
๐Ÿ‡ณ๐Ÿ‡ฑ Qwen3-ASR-1.7B-NL โ€” 6.68% โ†’ 5.28% WER (-21%)
๐Ÿ‡ณ๐Ÿ‡ฑ Qwen3-ASR-0.6B-NL โ€” 12.46% โ†’ 8.31% WER (-33%)

The 0.6B variants are the more interesting half of the release. They give up only a few WER points compared to the 1.7B at a third of the parameters โ€” relevant for edge hardware, CPU inference, or anywhere keeping inference cost down. The Dutch 0.6B in particular lands at 8.3% WER on CV22, competitive with much larger systems.

The Dutch 1.7B started from a strong 6.7% zero-shot, so the absolute gain is smaller โ€” Qwen already handles Dutch well, and the fine-tune mostly sharpens it on Common Voice's casing and punctuation conventions.

Training stuck close to Qwen's official SFT recipe (lr 2e-5, linear schedule, 2% warmup, bf16, gradient checkpointing on a single H100). The data is the differentiator: Common Voice 22 train + validation augmented with synthetic OpenAI-TTS speech, filtered by the WAVe multimodal embedding model that scores clips at the word level and drops the ones that don't align well with their transcripts.

๐Ÿ“ฆ Full pipeline โ€” synthetic data generation, WAVe filtering, training scripts, evaluation protocol โ€” is open-source:
github.com/yuriyvnv/TTS-Augmented-ASR
@hf-audio .
#asr #speech #parakeet #nvidia #nemo #multilingual #fine-tuning #commonvoice