Instructions to use microsoft/phi-1_5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/phi-1_5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/phi-1_5")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5") model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use microsoft/phi-1_5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/phi-1_5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1_5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/microsoft/phi-1_5
- SGLang
How to use microsoft/phi-1_5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/phi-1_5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1_5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/phi-1_5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1_5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use microsoft/phi-1_5 with Docker Model Runner:
docker model run hf.co/microsoft/phi-1_5
Commit History
Update README.md 467adac verified
Update README.md 349cf8b verified
Update README.md 83b9c52 verified
Update README.md 59e722d verified
Upload 5 files 341a17a
Update README.md 24f9ea1
Upload 4 files d262514
Update README.md f27cd93
Update README.md 80c0ba9
Disables inference API to prevent mismatch with HF implementation. a286f5c
Adds support for MQA/GQA and attention mask during training. de35f90
Upload README.md bc09a08
Support for `attention_mask` in forward pass. 3128bb6
Update README.md 7d482dd
Update README.md c8f6ad8
Link paper to arXiv (#5) 762a311
Update README.md ea95720
Update README.md 4bba51c
Update README.md 52e294a
Update README.md 07a048e
Update README.md b630515
Update README.md 40b496f
Update README.md d9c7521
Update README.md 6ddac37
Update README.md cd4510c
Update README.md 34046b0
Update README.md 24ad69c
Update README.md b3d67f3
initial commit 98416e6
Gunasekar commited on