SimpleLM

Custom decoder-only Transformer language model (pretraining checkpoint). Architecture is defined in modeling_simple_lm.py (bundled in this repo) and loaded via trust_remote_code=True.

Source checkpoint: checkpoints/lm_checkpoint_008_shutdown.pt

This model is a pre-trained only LLM that was trained from scratch on a very small dataset of conversations (found on Kaggle and mixed with OpenAssistant/oasst2). As well as as subset of Finweb_Edu data. This particular save is checkpoint after 1 full epoch. Alltogether about 410M tokens (1B+ would have been more ideal for a model this size).

Architecture

field value
vocab_size 32000
context_length 512
d_model 768
n_layers 12
n_heads 8
d_ff 2048
activation gelu
bias True
tie_word_embeddings True

Tokenizer source: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "etanlightstone/simple-lm-v2"
tok   = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True)

prompt = "Once upon a time"
ids = tok(prompt, return_tensors="pt").input_ids
out = model.generate(ids, max_new_tokens=80, do_sample=True, top_k=50, temperature=0.9)
print(tok.decode(out[0], skip_special_tokens=True))

Training settings

{
  "batch_size": 10,
  "batch_size_note": "per GPU when using torchrun",
  "world_size": 1,
  "learning_rate": 0.0003,
  "weight_decay": 0.01,
  "num_epochs": 3,
  "max_steps": null,
  "grad_clip": 1.0,
  "seed": 42,
  "docs_dir": "/home/etan/simple_llm/docs",
  "block_size": 512,
  "stride": 448,
  "stride_overlap_tokens": 64
}
Downloads last month
43
Safetensors
Model size
91.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support