Instructions to use NNEngine/TinyWay-1.2.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NNEngine/TinyWay-1.2.0 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="NNEngine/TinyWay-1.2.0", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("NNEngine/TinyWay-1.2.0", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use NNEngine/TinyWay-1.2.0 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "NNEngine/TinyWay-1.2.0" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NNEngine/TinyWay-1.2.0", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/NNEngine/TinyWay-1.2.0
- SGLang
How to use NNEngine/TinyWay-1.2.0 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "NNEngine/TinyWay-1.2.0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NNEngine/TinyWay-1.2.0", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "NNEngine/TinyWay-1.2.0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NNEngine/TinyWay-1.2.0", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use NNEngine/TinyWay-1.2.0 with Docker Model Runner:
docker model run hf.co/NNEngine/TinyWay-1.2.0
| license: mit | |
| datasets: | |
| - shivendrra/consolidated-datasets | |
| language: | |
| - en | |
| metrics: | |
| - perplexity | |
| tags: | |
| - Basemodel | |
| - text-generation | |
| - nlp | |
| - custom_code | |
| - casual-llm | |
| library_name: transformers | |
| # TinyWay-1.2.0 | |
| **TinyWay-1.2.0** is a lightweight GPT-style causal language model (~110M parameters) trained from scratch on a mixed streaming corpus (web text, stories, and code). | |
| The model is designed for research, experimentation, and educational purposes, with an emphasis on transparent architecture and reproducible training. | |
| > โก Trained end-to-end using a custom PyTorch pipeline with mixed precision, gradient accumulation, and streaming datasets. | |
| --- | |
| ## Model Overview | |
| | Property | Value | | |
| | ----------------- | ------------------------------------ | | |
| | Model type | Decoder-only Transformer (GPT-style) | | |
| | Parameters | **~109.6M** | | |
| | Layers | 10 | | |
| | Hidden size | 768 | | |
| | Attention heads | 12 | | |
| | Context length | 256 tokens | | |
| | Activation | GELU | | |
| | Dropout | 0.1 | | |
| | Precision | fp16 / bf16 | | |
| | Weight tying | Token embedding tied with LM head | | |
| | Position encoding | Learned absolute embeddings | | |
| --- | |
| ## Training Details | |
| ### Dataset | |
| The model was trained using **streaming data** from: | |
| * ๐ Web text | |
| * ๐ Stories | |
| * ๐ป Code | |
| via the HuggingFace dataset: | |
| ``` | |
| shivendrra/consolidated-datasets | |
| ``` | |
| Streaming was used to avoid large local storage and to allow continuous sampling directly from HuggingFace. | |
| --- | |
| ### Tokenization | |
| * Tokenizer: **GPT2TokenizerFast** | |
| * Vocabulary size: **50,257** | |
| * Special tokens: | |
| * `bos_token_id = eos_token_id = pad_token_id = 50256` | |
| --- | |
| ### Training Configuration | |
| | Setting | Value | | |
| | --------------------- | ---------------------------- | | |
| | Sequence length | 256 | | |
| | Effective batch size | 64 sequences | | |
| | Optimizer | AdamW | | |
| | Learning rate | 3e-4 (cosine decay + warmup) | | |
| | Betas | (0.9, 0.95) | | |
| | Weight decay | 0.1 | | |
| | Gradient clipping | 1.0 | | |
| | Mixed precision | AMP (fp16 / bf16) | | |
| | Gradient accumulation | Yes | | |
| | Training steps | ~60k | | |
| | Total tokens | ~1B (approx) | | |
| Final training loss โ **3.0** | |
| Final perplexity โ **~20** | |
| --- | |
| ## Usage | |
| ### Load with Transformers (Custom Code Required) | |
| This repository uses a custom model definition (`modeling_tinyway.py`). | |
| Make sure it is available in your environment before loading. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("NNEngine/TinyWay-1.2.0") | |
| tokenizer = AutoTokenizer.from_pretrained("gpt2") | |
| ``` | |
| --- | |
| ### Text Generation Example | |
| ```python | |
| import torch | |
| prompt = "Once upon a time" | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=200, | |
| temperature=0.8, | |
| top_k=50, | |
| top_p=0.95, | |
| do_sample=True | |
| ) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| --- | |
| ## Example Generations | |
| The model demonstrates: | |
| * โ Coherent sentence structure | |
| * โ Narrative flow in stories | |
| * โ Reasonable grammar and punctuation | |
| * โ ๏ธ Occasional repetition and topic drift (expected for this scale) | |
| This is a research-grade small LLM, not instruction-aligned by default. | |
| --- | |
| ## Limitations | |
| * โ Not instruction-tuned | |
| * โ Limited reasoning depth compared to large LLMs | |
| * โ Context length limited to 256 tokens | |
| * โ ๏ธ May hallucinate or generate inconsistent facts | |
| * โ ๏ธ Training data may contain noise from web sources | |
| Use responsibly. | |
| --- | |
| ## Intended Use | |
| * Research experiments | |
| * Educational purposes | |
| * Model scaling studies | |
| * Training pipeline benchmarking | |
| * Custom fine-tuning experiments | |
| Not recommended for production or safety-critical applications. | |
| --- | |
| ## Reproducibility | |
| The model was trained using: | |
| * Custom PyTorch training loop | |
| * Streaming datasets via HuggingFace | |
| * Mixed precision training | |
| * Gradient accumulation | |
| * Periodic checkpointing | |
| * Full monitoring (loss, perplexity, gradient norm, attention entropy) | |
| If youโd like the full training code or configs, feel free to reach out. | |
| --- | |
| ## License | |
| This model follows the license of the underlying datasets and tokenizer. | |
| Please ensure compliance before commercial usage. | |
| --- | |
| ## Acknowledgements | |
| * HuggingFace ๐ค | |
| * PyTorch | |
| * GPT-2 tokenizer | |
| * Open research community |