Why was no slavic language included in the training dataset?

#56

by brabecjan91 - opened Jul 19, 2022

Jul 19, 2022

It would seem natural to me to include at least a few languages from the Slavic family given that there was apparently a process to make the languages in the training dataset diverse.

brabecjan91 changed discussion title from Why was no slavic language included in training dataset? to Why was no slavic language included in the training dataset? Jul 19, 2022

SaulLu

BigScience Workshop org Jul 19, 2022

A good pointer that could answer your question may be @yjernite tweet answering "why wasn't language X included in the @BigScienceLLM training data" 🤗

TimeRobber

BigScience Workshop org Nov 15, 2022

Closing as this seems to have been resolved. Thank you!

TimeRobber changed discussion status to closed Nov 15, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment