LH-Tech-AI commited on
Commit
e00a304
·
verified ·
1 Parent(s): 96ddf6c

Update research.html

Browse files
Files changed (1) hide show
  1. research.html +2 -2
research.html CHANGED
@@ -253,7 +253,7 @@
253
 
254
  <ul>
255
  <li><strong>The Setup:</strong> We are training an ultra-lean <strong>5M parameter Llama model</strong> using Hugging Face Transformers.</li>
256
- <li><strong>The Data:</strong> Exactly <strong>1 Billion tokens</strong> total per run, testing four configurations:
257
  <br>1. 100% <code>FineWeb-Edu</code>
258
  <br>2. 100% <code>DCLM-Edu</code>
259
  <br>3. 100% <code>Cosmopedia-v2</code>
@@ -284,7 +284,7 @@
284
  <p>The standard convention for LLMs is "one epoch and move on" to avoid overfitting, popularized by several landmark papers. But small models training on high-quality educational data might be a completely different beast. Can they chew on the same high-signal data multiple times?</p>
285
 
286
  <ul>
287
- <li><strong>The Setup:</strong> A <strong>10M parameter Llama model</strong> trained on exactly <strong>1 Billion tokens</strong> of <code>FineWeb-Edu</code>.</li>
288
  <li><strong>The Epoch Matrix:</strong> We are running 5 identical setups, changing only the epoch count: <strong>1 Epoch vs. 2, 3, 4, and 5 Epochs</strong>.</li>
289
  </ul>
290
  <p><strong>The Goal:</strong> Pinpoint exactly where overfitting begins for an SLM. If performance on <code>lm-eval</code> keeps scaling up past epoch 2 or 3 without destroying perplexity, it could mean data-scarcity solutions for edge AI are much easier than we think.</p>
 
253
 
254
  <ul>
255
  <li><strong>The Setup:</strong> We are training an ultra-lean <strong>5M parameter Llama model</strong> using Hugging Face Transformers.</li>
256
+ <li><strong>The Data:</strong> Exactly <strong>500 Million tokens</strong> total per run, testing four configurations:
257
  <br>1. 100% <code>FineWeb-Edu</code>
258
  <br>2. 100% <code>DCLM-Edu</code>
259
  <br>3. 100% <code>Cosmopedia-v2</code>
 
284
  <p>The standard convention for LLMs is "one epoch and move on" to avoid overfitting, popularized by several landmark papers. But small models training on high-quality educational data might be a completely different beast. Can they chew on the same high-signal data multiple times?</p>
285
 
286
  <ul>
287
+ <li><strong>The Setup:</strong> A <strong>10M parameter Llama model</strong> trained on exactly <strong>500 Million tokens</strong> of <code>FineWeb-Edu</code>.</li>
288
  <li><strong>The Epoch Matrix:</strong> We are running 5 identical setups, changing only the epoch count: <strong>1 Epoch vs. 2, 3, 4, and 5 Epochs</strong>.</li>
289
  </ul>
290
  <p><strong>The Goal:</strong> Pinpoint exactly where overfitting begins for an SLM. If performance on <code>lm-eval</code> keeps scaling up past epoch 2 or 3 without destroying perplexity, it could mean data-scarcity solutions for edge AI are much easier than we think.</p>