HuggingFaceFW/fineweb-edu
Viewer
• Updated • 3.5B • 520k
• 1.12k
mlfoundations/dclm-baseline-1.0
Preview
• Updated • 624k
• 281
Viewer
• Updated • 4.48B • 56.9k
• 812
Note only multimodal data =(
Viewer
• Updated • 48.3M • 37.6k
• 364
Viewer
• Updated • 5.45B • 17.7k
• 571
Note Don't have directly text =(
HuggingFaceTB/issues-kaggle-notebooks
Viewer
• Updated • 16.1M • 2.15k
• 15
Note only 500k rows
Viewer
• Updated • 7.89M • 9.03k
• 185
Note 1.6M rows with web-0.5-to-1.0
Locutusque/UltraTextbooks
Viewer
• Updated • 5.52M • 1.73k
• 199
tokyotech-llm/swallow-math-v2
Viewer
• Updated • 17.4M • 19.9k
• 31
tokyotech-llm/swallow-code-v2
Viewer
• Updated • 147M • 72.4k
• 38
HuggingFaceFW/finepdfs-edu
Viewer
• Updated • 49.5M • 10.8k
• 89
HuggingFaceTB/smollm-corpus
Viewer
• Updated • 237M • 58.4k
• 459