Instructions to use TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge
- SGLang
How to use TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge with Docker Model Runner:
docker model run hf.co/TehVenom/MPT-7b-Chat-Instruct-LongCTX-Merge
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
MPT-7b-Chat-Instruct-LongCTX-Merge
A merge between the long context Storywriting and the short context Chat oriented MPT-7b models.
Model description
This is a merged model, using a weighted parameter blend strategy at a (20:20:60) ratio between the models:
- [60%] - 2048 CTX MTP-7b Chat: https://huggingface.co/TehVenom/MPT-7b-chat-V
- [20%] - 2048 CTX MTP-7b Instruct: https://huggingface.co/TehVenom/MPT-7b-instruct-V
- [20%] - 65k CTX MTP-7b Storywriter: https://huggingface.co/TehVenom/MPT-7b-storywriter-Apache-2.0/
For a final model composed of:
(MTP-7b Storywriter [20%] + MTP-7b Instruct [20%] ) + MTP-7b Chat [60%]
This was done for the sake of testing the theory of how 'long context' tunes affect attention when merged with a model that has been trained for a different purpose, on a shorter context span. Different from the first merges (That sports a 50/50 ratio), this one is lopsided towards the Chat base model to have another comparison point for the effects of CTX span merging, and to have a model that is primarily focused on Chatting.
There are two objectives for this merge, first one is to see how much out of the 65k-Storywriter model is necessart to raise the ceiling of the final model's context size, and to try and make the base Chat model less dry, and slightly more fun / verbose, and intelligent by adding the literature / Instruct based models into it.
Due to the influence of MPT-7b Storywriter, this model may generate content that is considered NSFW due to the wide array of books sampled for MPT-7b Storywriter.
The specific prompting is unknown, but try approaching it as a chat bot scenario / prompt.
Try starting a two line prompt such as:
Bot: "Hello, how are you?"
You: "I am doing just fine, thank you."
Or try giving it instructions such as:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Explain the concept of artificial intelligence in simple terms.
### Response:
Artificial Intelligence (AI) is the ability of machines and computers to make decisions and complete tasks similar to what humans can do.
AI can learn from data, recognize patterns, and make predictions.
AI technology can automate tedious and complex processes, helping to reduce human effort and errors, and enable more accurate and efficient decisions.
AI can be used in a wide range of applications, from robotics and autonomous vehicles to healthcare and finance. It is increasingly becoming an integral part of everyday life.
Check out the datasets involved in the Chat-7b model to get a better grasp on how to prompt it well:
-(Anthropic/hh-rlhf)[https://huggingface.co/datasets/Anthropic/hh-rlhf] -(tatsu-lab/alpaca)[https://huggingface.co/datasets/tatsu-lab/alpaca] -(Hello-SimpleAI/HC3)[https://huggingface.co/datasets/Hello-SimpleAI/HC3]
Read the original model cards to understand how to run inference on this model.
- Downloads last month
- 839