Salesforce/wikitext
Viewer • Updated • 3.71M • 1.33M • 688
This is a 125M parameter language model designed to be trained and run on consumer hardware with limited VRAM (4GB+). The model follows a GPT-style architecture but is optimized for efficiency and memory usage.
from transformers import AutoTokenizer
from model import SmallLanguageModel, ModelConfig
# Initialize model
config = ModelConfig(
vocab_size=50257,
block_size=512,
n_layer=12,
n_head=12,
n_embd=768,
dropout=0.1,
bias=True
)
model = SmallLanguageModel(config)
# Generate text
tokenizer = AutoTokenizer.from_pretrained("gpt2")
input_text = "Once upon a time"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
output_ids = model.generate(input_ids, max_length=100)
generated_text = tokenizer.decode(output_ids[0])
This model is released under the MIT License.