eryx-swahili-tts-v1

Model Description

eryx-swahili-tts-v1 is a Swahili text-to-speech model developed by Eryx Labs.

This model provides speaker embeddings extracted from native Swahili speech data, enabling natural-sounding Swahili voice synthesis using XTTS-v2.

Model Details

Attribute Value
Base Model coqui/XTTS-v2
Method Speaker Conditioning
Language Swahili (Kiswahili)
Training Data OpenSLR SLR25 (11.49 hours)
Reference Samples 100 audio clips
Developed by Eryx Labs

Usage

import torch
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
from huggingface_hub import hf_hub_download

# Download speaker embeddings
embedding_path = hf_hub_download(
    repo_id="EryxLabs/eryx-swahili-tts-v1",
    filename="swahili_speaker.pt"
)

# Load XTTS-v2 model
model_path = "path/to/xtts_v2"  # or download from coqui
config = XttsConfig()
config.load_json(f"{model_path}/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir=model_path, eval=True)

# Load Swahili speaker embeddings
embeddings = torch.load(embedding_path)
gpt_cond_latent = embeddings['gpt_cond_latent']
speaker_embedding = embeddings['speaker_embedding']

# Synthesize Swahili text
# Note: Use 'en' for language since XTTS-v2 doesn't support 'sw' directly
out = model.inference(
    text="Habari yako, mimi ni msaidizi wa Kiswahili.",
    language="en",  # Swahili uses Latin script, works with English tokenizer
    gpt_cond_latent=gpt_cond_latent,
    speaker_embedding=speaker_embedding,
)

# Save audio
import torchaudio
torchaudio.save("output.wav", torch.tensor(out["wav"]).unsqueeze(0), 24000)

Files

  • swahili_speaker.pt - Speaker embeddings (gpt_cond_latent + speaker_embedding)
  • samples/ - Example synthesized audio files

Training Data

The model was trained on:

  • OpenSLR SLR25 - Swahili Broadcast News Speech Corpus
    • 11,919 audio samples
    • 11.49 hours total duration
    • Native Swahili speakers

Limitations

  • XTTS-v2 doesn't natively support Swahili language code; uses English tokenizer
  • Best results with clear, well-punctuated text
  • Some pronunciation may differ from native speech patterns

About Eryx Labs

Eryx Labs builds AI solutions for African languages and communities.

License

Apache 2.0

Citation

@misc{eryx-swahili-tts,
  author = {Eryx Labs},
  title = {eryx-swahili-tts-v1: Swahili Text-to-Speech Speaker Embeddings},
  year = {2025},
  publisher = {Eryx Labs},
  url = {https://eryxlabs.co.ke}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Engeryx/eryx-swahili-tts-v1

Base model

coqui/XTTS-v2
Finetuned
(63)
this model