Text-to-Speech
Transformers
English
Chinese
TTS_eval_models / README.md
zhu-han's picture
Update README.md
e876de7 verified
metadata
datasets:
  - k2-fsa/TTS_eval_datasets
language:
  - en
  - zh
license: apache-2.0
pipeline_tag: text-to-speech
library_name: transformers

TTS Evaluation Models

This repository contains models for the objective evaluation of text-to-speech (TTS) models, as presented in the papers ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching, ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching, OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models.

Evaluation Metrics

This repository specifically supports the following evaluation metrics:

For more details, please refer to repositories ZipVoice and OmniVoice.

Citation

@article{zhu2025zipvoice,
      title={ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching},
      author={Zhu, Han and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Li, Zhaoqing and Zhuang, Weiji and Lin, Long and Povey, Daniel},
      journal={arXiv preprint arXiv:2506.13053},
      year={2025}
}

@article{zhu2025zipvoicedialog,
      title={ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching},
      author={Zhu, Han and Kang, Wei and Guo, Liyong and Yao, Zengwei and Kuang, Fangjun and Zhuang, Weiji and Li, Zhaoqing and Han, Zhifeng and Zhang, Dong and Zhang, Xin and Song, Xingchen and Lin, Long and Povey, Daniel},
      journal={arXiv preprint arXiv:2507.09318},
      year={2025}
}

@article{zhu2026omnivoice,
      title={OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models},
      author={Zhu, Han and Ye, Lingxuan and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Han, Zhifeng and Zhuang, Weiji and Lin, Long and Povey, Daniel},
      journal={arXiv preprint arXiv:2604.00688},
      year={2026}
}