eryx-swahili-tts-v1
Model Description
eryx-swahili-tts-v1 is a Swahili text-to-speech model developed by Eryx Labs.
This model provides speaker embeddings extracted from native Swahili speech data, enabling natural-sounding Swahili voice synthesis using XTTS-v2.
Model Details
| Attribute | Value |
|---|---|
| Base Model | coqui/XTTS-v2 |
| Method | Speaker Conditioning |
| Language | Swahili (Kiswahili) |
| Training Data | OpenSLR SLR25 (11.49 hours) |
| Reference Samples | 100 audio clips |
| Developed by | Eryx Labs |
Usage
import torch
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
from huggingface_hub import hf_hub_download
# Download speaker embeddings
embedding_path = hf_hub_download(
repo_id="EryxLabs/eryx-swahili-tts-v1",
filename="swahili_speaker.pt"
)
# Load XTTS-v2 model
model_path = "path/to/xtts_v2" # or download from coqui
config = XttsConfig()
config.load_json(f"{model_path}/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir=model_path, eval=True)
# Load Swahili speaker embeddings
embeddings = torch.load(embedding_path)
gpt_cond_latent = embeddings['gpt_cond_latent']
speaker_embedding = embeddings['speaker_embedding']
# Synthesize Swahili text
# Note: Use 'en' for language since XTTS-v2 doesn't support 'sw' directly
out = model.inference(
text="Habari yako, mimi ni msaidizi wa Kiswahili.",
language="en", # Swahili uses Latin script, works with English tokenizer
gpt_cond_latent=gpt_cond_latent,
speaker_embedding=speaker_embedding,
)
# Save audio
import torchaudio
torchaudio.save("output.wav", torch.tensor(out["wav"]).unsqueeze(0), 24000)
Files
swahili_speaker.pt- Speaker embeddings (gpt_cond_latent + speaker_embedding)samples/- Example synthesized audio files
Training Data
The model was trained on:
- OpenSLR SLR25 - Swahili Broadcast News Speech Corpus
- 11,919 audio samples
- 11.49 hours total duration
- Native Swahili speakers
Limitations
- XTTS-v2 doesn't natively support Swahili language code; uses English tokenizer
- Best results with clear, well-punctuated text
- Some pronunciation may differ from native speech patterns
About Eryx Labs
Eryx Labs builds AI solutions for African languages and communities.
License
Apache 2.0
Citation
@misc{eryx-swahili-tts,
author = {Eryx Labs},
title = {eryx-swahili-tts-v1: Swahili Text-to-Speech Speaker Embeddings},
year = {2025},
publisher = {Eryx Labs},
url = {https://eryxlabs.co.ke}
}
Model tree for Engeryx/eryx-swahili-tts-v1
Base model
coqui/XTTS-v2