MixMinMatch Collection of datasets from MixMinMatch work. Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 3 AdaMLLab/AraMix Viewer • Updated 3 days ago • 394M • 2.17k • 5 AdaMLLab/TurMix Viewer • Updated 3 days ago • 681M • 639 • 1 AdaMLLab/HinMix Viewer • Updated 3 days ago • 179M • 284 • 1
Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 3
MixMinMatch Collection of datasets from MixMinMatch work. Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 3 AdaMLLab/AraMix Viewer • Updated 3 days ago • 394M • 2.17k • 5 AdaMLLab/TurMix Viewer • Updated 3 days ago • 681M • 639 • 1 AdaMLLab/HinMix Viewer • Updated 3 days ago • 179M • 284 • 1
Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 3