eryx-swahili-tts-v1

Model Description

eryx-swahili-tts-v1 is a Swahili text-to-speech model developed by Eryx Labs.

This model provides speaker embeddings extracted from native Swahili speech data, enabling natural-sounding Swahili voice synthesis using XTTS-v2.

Model Details

Attribute	Value
Base Model	coqui/XTTS-v2
Method	Speaker Conditioning
Language	Swahili (Kiswahili)
Training Data	OpenSLR SLR25 (11.49 hours)
Reference Samples	100 audio clips
Developed by	Eryx Labs

Usage

import torch
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
from huggingface_hub import hf_hub_download

# Download speaker embeddings
embedding_path = hf_hub_download(
    repo_id="EryxLabs/eryx-swahili-tts-v1",
    filename="swahili_speaker.pt"
)

# Load XTTS-v2 model
model_path = "path/to/xtts_v2"  # or download from coqui
config = XttsConfig()
config.load_json(f"{model_path}/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir=model_path, eval=True)

# Load Swahili speaker embeddings
embeddings = torch.load(embedding_path)
gpt_cond_latent = embeddings['gpt_cond_latent']
speaker_embedding = embeddings['speaker_embedding']

# Synthesize Swahili text
# Note: Use 'en' for language since XTTS-v2 doesn't support 'sw' directly
out = model.inference(
    text="Habari yako, mimi ni msaidizi wa Kiswahili.",
    language="en",  # Swahili uses Latin script, works with English tokenizer
    gpt_cond_latent=gpt_cond_latent,
    speaker_embedding=speaker_embedding,
)

# Save audio
import torchaudio
torchaudio.save("output.wav", torch.tensor(out["wav"]).unsqueeze(0), 24000)

Files

swahili_speaker.pt - Speaker embeddings (gpt_cond_latent + speaker_embedding)
samples/ - Example synthesized audio files

Training Data

The model was trained on:

OpenSLR SLR25 - Swahili Broadcast News Speech Corpus
- 11,919 audio samples
- 11.49 hours total duration
- Native Swahili speakers

Limitations

XTTS-v2 doesn't natively support Swahili language code; uses English tokenizer
Best results with clear, well-punctuated text
Some pronunciation may differ from native speech patterns

About Eryx Labs

Eryx Labs builds AI solutions for African languages and communities.

License

Apache 2.0

Citation

@misc{eryx-swahili-tts,
  author = {Eryx Labs},
  title = {eryx-swahili-tts-v1: Swahili Text-to-Speech Speaker Embeddings},
  year = {2025},
  publisher = {Eryx Labs},
  url = {https://eryxlabs.co.ke}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Engeryx/eryx-swahili-tts-v1

Base model

coqui/XTTS-v2

Finetuned

(63)

this model