Instructions to use rhaymison/cuscuz-com-gemma-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rhaymison/cuscuz-com-gemma-2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="rhaymison/cuscuz-com-gemma-2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("rhaymison/cuscuz-com-gemma-2b")
model = AutoModelForCausalLM.from_pretrained("rhaymison/cuscuz-com-gemma-2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use rhaymison/cuscuz-com-gemma-2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rhaymison/cuscuz-com-gemma-2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rhaymison/cuscuz-com-gemma-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rhaymison/cuscuz-com-gemma-2b

SGLang

How to use rhaymison/cuscuz-com-gemma-2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rhaymison/cuscuz-com-gemma-2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rhaymison/cuscuz-com-gemma-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rhaymison/cuscuz-com-gemma-2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rhaymison/cuscuz-com-gemma-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use rhaymison/cuscuz-com-gemma-2b with Docker Model Runner:
```
docker model run hf.co/rhaymison/cuscuz-com-gemma-2b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

updated: 14-03-2024

Model description

The Cuscuz-com-gemma 2b is a model derived from a fine tuning of the google/gemma-2b-it. This model was tuned to be specialized in the Northeast region of Brazil. The model was specialized in a dataset that covered historical, geographical, economic, cultural and culinary issues in the northeast region. To make better use of the Cuscuz, the ideal is to use the model without quantization. This model is a small version of Cuscuz-7b whit another arquiteture

How to Use

from transformers import AutoTokenizer, pipeline
import torch

model = "rhaymison/cuscuz-com-gemma-2b"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

messages = [
    {"role": "user", "content": "Me conte sobre o estado de Sergipe."},
]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(
    prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)
print(outputs[0]["generated_text"][len(prompt):].replace("model",""))

#Sergipe é o menor estado do Nordeste brasileiro em extensão territorial. 
#O estado de Sergipe é conhecido por suas praias, sua culinária à base de frutos do mar e sua importância histórica na produção de açúcar. 
#Sergipe teve papel fundamental na produção de açúcar no Brasil colonial, sendo uma das regiões onde se concentraram os engenhos de açúcar.

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer2 = AutoTokenizer.from_pretrained("cuscuz-com-gemma-2b")
model2 = AutoModelForCausalLM.from_pretrained("cuscuz-com-gemma-2b", device_map={"":0})
tokenizer2.pad_token = tokenizer2.eos_token
tokenizer2.add_eos_token = True
tokenizer2.add_bos_token, tokenizer2.add_eos_token
tokenizer2.padding_side = "right"


text = f"""
Você é um assistente especialista em história do Nordeste Brasileiro.
Você sempre responde de forma clara e educada e sempre com informações
verdadeiras. Responda com detalhes e riquesas de informação
<start_of_turn>Me conte sobre o Folclore Nordestino?<end_of_turn>
<start_of_turn>model"""

device = "cuda:0"

inputs = tokenizer2(text, return_tensors="pt").to(device)

outputs = model2.generate(**inputs, max_new_tokens=100, do_sample=False)

output = tokenizer2.decode(outputs[0], skip_special_tokens=True, skip_prompt=True)
print(output.replace("model"," "))

#O Folclore Nordestino é uma parte importante da nossa cultura, com manifestações como o bumba meu boi, o reisado, o maracatu e o repente.
#Essa história é rica em lendas, contarorias e tradições que são passadas de geração em geração.