MIMI Pro

MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution โ€” designed to run entirely on-device, in the browser, with zero cloud dependencies.

Part of the MIMI Model Family by Mimi Tech AI.

๐Ÿ”ฌ V1 โ€” Experimental Release. This model is fine-tuned for the MIMI Agent's custom tool-calling format. For standard tool calling, the base Qwen3-4B may perform equally well or better with native <tool_call> prompting. V2 with official BFCL scores and Qwen3-native format support is in development.

Performance

BFCL V4 Benchmark (Partial โ€” Single-Turn, 20 samples/category)

Category MIMI Pro V1 Base Qwen3-4B Notes
Simple Python 60.8% (400 tests) 80.0% (20 tests) Base outperforms
Simple Java 21.0% (100 tests) 60.0% (20 tests) Base outperforms
Multiple (Sequential) 57.5% (200 tests) 75.0% (20 tests) Base outperforms
Parallel 2.0% (200 tests) 75.0% (20 tests) Fine-tune degraded
Irrelevance 90% (20 tests) 100% (20 tests) Both strong
Live Simple โ€” 90.0% (20 tests) Base only

โš ๏ธ Important Context: The previously reported "97.7% accuracy" was a training validation metric (token-level accuracy on the training/eval split), not a standardized benchmark score. The table above shows actual BFCL V4 results. We are working on a full official evaluation.

Training Metrics (Internal)

Metric Value
Training Token Accuracy 97.66%
Eval Token Accuracy 97.29%
Training Loss 0.084
Parameters 4.02 Billion
Quantized Size 2.3 GB (Q4_K_M)

Architecture

MIMI Pro is built on Qwen3-4B, fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using Unsloth on NVIDIA DGX Spark.

Key Design Decisions:

  • Custom tool-calling format optimized for the MIMI Agent browser environment
  • 19 tool types covering web search, code execution, file operations, browser automation
  • Trained on NVIDIA DGX Spark (Grace Blackwell GB10, 128 GB unified memory)

Known Limitations of V1:

  • Fine-tuning with aggressive hyperparameters (LoRA r=64, 3 epochs, LR 2e-4) caused some capability degradation vs. the base model, particularly for parallel tool calling
  • The custom {"tool": ..., "parameters": ...} format diverges from Qwen3's native <tool_call> format
  • V2 will address these issues with conservative fine-tuning and Qwen3-native format support

Supported Tools

Category Tools
๐ŸŒ Web web_search, browse_url, browser_action
๐Ÿ’ป Code execute_python, create_file, edit_file
๐Ÿ”ฌ Research deep_research, generate_document
๐Ÿ“ System read_file, list_directory, run_terminal
๐Ÿง  Reasoning Multi-step orchestration

Quick Start

Browser (wllama/WebAssembly)

import { Wllama } from '@anthropic-ai/wllama';

const wllama = new Wllama(wasmPaths);
await wllama.loadModelFromUrl(
  'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf',
  { n_ctx: 4096 }
);

const response = await wllama.createChatCompletion([
  { role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
  { role: 'user', content: 'Search for the latest AI news and summarize it' }
]);

llama.cpp

./llama-cli -m mimi-qwen3-4b-q4km.gguf \
  -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
  -n 512 --temp 0.6

Python

from llama_cpp import Llama
llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
output = llm.create_chat_completion(messages=[
    {"role": "system", "content": "You are MIMI, an AI agent with tool access."},
    {"role": "user", "content": "Search for the latest AI news"}
])

Output Format

MIMI Pro V1 uses a custom format (V2 will support Qwen3-native <tool_call> format):

{"tool": "web_search", "parameters": {"query": "latest AI news March 2026", "limit": 5}}

The MIMI Model Family

Model Parameters Size Target Device Status
MIMI Nano 0.6B ~400 MB Any device, IoT ๐Ÿ”œ Coming
MIMI Small 1.7B ~1.0 GB Mobile & tablets ๐Ÿ”œ Coming
MIMI Pro 4.02B 2.3 GB Desktop & laptop โœ… Available
MIMI Max 8B ~4.5 GB Workstations ๐Ÿ”œ Coming

All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.

Training Details

method: LoRA (PEFT) via Unsloth
base_model: Qwen/Qwen3-4B
lora_rank: 64
lora_alpha: 128
lora_dropout: 0.05
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
learning_rate: 2.0e-04
epochs: 3
effective_batch_size: 8
max_seq_length: 2048
optimizer: adamw_8bit
precision: bf16
gradient_checkpointing: true
packing: true
dataset: 1,610 curated tool-calling examples (178K tokens)
hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)

Why MIMI?

  • ๐Ÿ”’ Privacy First โ€” Your data never leaves your device. Period.
  • ๐Ÿ’ฐ Zero Cost โ€” No API keys, no subscriptions, no per-token billing.
  • โšก Fast โ€” Runs at native speed via WebAssembly, no server round-trips.
  • ๐ŸŒ Works Offline โ€” Once downloaded, no internet required.
  • ๐Ÿ”ง Tool Native โ€” Purpose-built for autonomous tool calling.

Limitations

  • V1 uses a custom tool-calling format (not Qwen3-native <tool_call>)
  • Parallel tool calling (multiple simultaneous calls) is degraded vs. base model
  • Context window: 4,096 tokens (training config). Base architecture supports 32K.
  • Requires ~3 GB RAM for inference in browser.
  • Q4_K_M quantization trades minimal quality for 3.5x size reduction.

Roadmap

  • V1 โ€” Custom format, 19 tools, browser-optimized (current release)
  • V2 โ€” Qwen3-native <tool_call> format, official BFCL V4 scores, conservative fine-tuning
  • Model Family โ€” Nano (0.6B), Small (1.7B), Max (8B) releases
  • Multi-Turn โ€” Agentic conversation chains with tool result feedback

About Mimi Tech AI

Mimi Tech AI builds on-device AI โ€” no cloud, no data leaks, full user control.

License

Apache 2.0 โ€” free for commercial and personal use.

Citation

@misc{mimitechai2026mimi,
  title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling},
  author={Bemler, Michael and Soppa, Michael},
  year={2026},
  publisher={Mimi Tech AI},
  url={https://huggingface.co/MimiTechAI/mimi-pro}
}
Downloads last month
571
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for MimiTechAI/mimi-pro

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Quantized
(193)
this model

Evaluation results