MiniMaxAI
/

MiniMax-M2.1

Text Generation

Model card Files Files and versions

MiniMax-M2.1 / docs /mlx_deploy_guide.md

rogeryoungh's picture

add mlx (#8)

927ea2b verified about 2 months ago

|

1.85 kB

	## MLX deployment guide

	Run, serve, and fine-tune [MiniMax-M2.1](https://huggingface.co/MiniMaxAI/MiniMax-M2.1) locally on your Mac using the MLX framework. This guide gets you up and running quickly.

	> Requirements
	> - Apple Silicon Mac (M3 Ultra or later)
	> - At least 256GB of unified memory (RAM)


	Installation

	Install the `mlx-lm` package via pip:

	```bash
	pip install -U mlx-lm
	```

	CLI

	Generate text directly from the terminal:

	```bash
	mlx_lm.generate \
	--model mlx-community/MiniMax-M2.1-4bit \
	--prompt "How tall is Mount Everest?"
	```

	> Add `--max-tokens 256` to control response length, or `--temp 0.7` for creativity.

	Python Script Example

	Use `mlx-lm` in your own Python scripts:

	```python
	from mlx_lm import load, generate

	# Load the quantized model
	model, tokenizer = load("mlx-community/MiniMax-M2.1-4bit")

	prompt = "Hello, how are you?"

	# Apply chat template if available (recommended for chat models)
	if tokenizer.chat_template is not None:
	messages = [{"role": "user", "content": prompt}]
	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	# Generate response
	response = generate(
	model,
	tokenizer,
	prompt=prompt,
	max_tokens=256,
	temp=0.7,
	verbose=True
	)

	print(response)
	```

	Tips
	- Model variants: Check this [MLX community collection on Hugging Face](https://huggingface.co/collections/mlx-community/minimax-m2.1) for `MiniMax-M2.1-4bit`, `6bit`, `8bit`, or `bfloat16` versions.
	- Fine-tuning: Use `mlx-lm.lora` for efficient parameter-efficient fine-tuning (PEFT).

	Resources
	- GitHub: [https://github.com/ml-explore/mlx-lm](https://github.com/ml-explore/mlx-lm)
	- Models: [https://huggingface.co/mlx-community](https://huggingface.co/mlx-community)