theprint-10B-MoE-A3B

A Mixture of Experts model built on Llama 3.2 3B, combining four specialized fine-tunes with a general-purpose model.

Architecture

Base model: theprint/GeneralChat-Llama3.2-3B
Gate mode: Hidden
Dtype: bfloat16
Experts: 4

Experts

Expert	Specialization
LLM-Data-Science-Llama3.2-3B	Machine learning, neural networks, fine-tuning, pre-training
CreativeWriter-Llama3.2-3B	Fiction writing, story structure, scene development, plot analysis
Llama-3.2-3B-VanRossum	Python programming, debugging, algorithm implementation
CogBeTh-Llama3.2-3B	Mental health support, anxiety, stress management, self-care

How It Works

The model uses a hidden gate mechanism to route inputs to the most relevant expert(s) based on the content of the prompt. Each expert was fine-tuned for its domain before being merged into this MoE architecture using mergekit.