Nanbeige/Nanbeige4.1-3B
Text Generation • 4B • Updated
• 202k • • 762
datatrove for all things web-scale data preparation: https://github.com/huggingface/datatrovenanotron for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotronlighteval for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval