Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
appvoid
's Collections
symbolic
cool datasets
arco releases
cool spaces
cool datasets
updated
10 days ago
some interesting datasets to use for language modeling
Upvote
-
appvoid/raw-corpus
Viewer
•
Updated
Feb 23, 2025
•
1.6M
•
8
pszemraj/simple_wikipedia
Viewer
•
Updated
Dec 29, 2025
•
238k
•
248
•
8
common-pile/youtube
Viewer
•
Updated
Jun 6, 2025
•
1.13M
•
755
•
10
srinivasbilla/self-instruct-base
Viewer
•
Updated
Jan 24, 2023
•
82.6k
•
83
•
5
agentlans/high-quality-english-sentences
Viewer
•
Updated
Oct 1, 2024
•
1.71M
•
980
•
31
agentlans/note-taking-v2
Viewer
•
Updated
Sep 22, 2025
•
17.6k
•
114
PleIAs/SYNTH
Viewer
•
Updated
Nov 11, 2025
•
68M
•
89.7k
•
252
Upvote
-
Share collection
View history
Collection guide
Browse collections