Spaces:

SupraLabs
/

Blog

Running

App Files Files Community

LH-Tech-AI commited on 3 days ago

Commit

e00a304

verified ·

1 Parent(s): 96ddf6c

Update research.html

Browse files

Files changed (1) hide show

research.html +2 -2

research.html CHANGED Viewed

@@ -253,7 +253,7 @@
                 <ul>
                     <li><strong>The Setup:</strong> We are training an ultra-lean <strong>5M parameter Llama model</strong> using Hugging Face Transformers.</li>
-                    <li><strong>The Data:</strong> Exactly <strong>1 Billion tokens</strong> total per run, testing four configurations:
                         <br>1. 100% <code>FineWeb-Edu</code>
                         <br>2. 100% <code>DCLM-Edu</code>
                         <br>3. 100% <code>Cosmopedia-v2</code>
@@ -284,7 +284,7 @@
                 <p>The standard convention for LLMs is "one epoch and move on" to avoid overfitting, popularized by several landmark papers. But small models training on high-quality educational data might be a completely different beast. Can they chew on the same high-signal data multiple times?</p>
                 <ul>
-                    <li><strong>The Setup:</strong> A <strong>10M parameter Llama model</strong> trained on exactly <strong>1 Billion tokens</strong> of <code>FineWeb-Edu</code>.</li>
                     <li><strong>The Epoch Matrix:</strong> We are running 5 identical setups, changing only the epoch count: <strong>1 Epoch vs. 2, 3, 4, and 5 Epochs</strong>.</li>
                 </ul>
                 <p><strong>The Goal:</strong> Pinpoint exactly where overfitting begins for an SLM. If performance on <code>lm-eval</code> keeps scaling up past epoch 2 or 3 without destroying perplexity, it could mean data-scarcity solutions for edge AI are much easier than we think.</p>

                 <ul>
                     <li><strong>The Setup:</strong> We are training an ultra-lean <strong>5M parameter Llama model</strong> using Hugging Face Transformers.</li>
+                    <li><strong>The Data:</strong> Exactly <strong>500 Million tokens</strong> total per run, testing four configurations:
                         <br>1. 100% <code>FineWeb-Edu</code>
                         <br>2. 100% <code>DCLM-Edu</code>
                         <br>3. 100% <code>Cosmopedia-v2</code>
                 <p>The standard convention for LLMs is "one epoch and move on" to avoid overfitting, popularized by several landmark papers. But small models training on high-quality educational data might be a completely different beast. Can they chew on the same high-signal data multiple times?</p>
                 <ul>
+                    <li><strong>The Setup:</strong> A <strong>10M parameter Llama model</strong> trained on exactly <strong>500 Million tokens</strong> of <code>FineWeb-Edu</code>.</li>
                     <li><strong>The Epoch Matrix:</strong> We are running 5 identical setups, changing only the epoch count: <strong>1 Epoch vs. 2, 3, 4, and 5 Epochs</strong>.</li>
                 </ul>
                 <p><strong>The Goal:</strong> Pinpoint exactly where overfitting begins for an SLM. If performance on <code>lm-eval</code> keeps scaling up past epoch 2 or 3 without destroying perplexity, it could mean data-scarcity solutions for edge AI are much easier than we think.</p>