| --- |
| |
| language: |
| - en |
| - zh |
| library_name: huggingface_hub |
| license: apache-2.0 |
| pipeline_tag: text-to-speech |
| tags: |
| - text-to-audio |
| - music |
| - singing-voice-synthesis |
| - svs |
| - zero-shot |
|
|
| --- |
| |
| ## ComfyUI Custom Node |
|
|
| This repository includes a custom node for ComfyUI integration: |
|
|
| 🔗 **[ComfyUI-SoulX-Singer](https://github.com/Saganaki22/ComfyUI-SoulX-Singer)** |
|
|
|
|
|  |
|
|
| Use this custom node to integrate SoulX-Singer into your ComfyUI workflows for seamless singing voice synthesis. |
|
|
| # SoulX-Singer: Converted .pt model to .safetensors |
| **bf16 + fp32** |
|
|
| ## Audio Samples |
|
|
| ### Original Audio |
| <audio controls> |
| <source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/song.mp3" type="audio/mpeg"> |
| Your browser does not support the audio element. |
| </audio> |
|
|
| ### SpongeBob Voice |
| <audio controls> |
| <source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/generated/sample-1.mp3" type="audio/mpeg"> |
| Your browser does not support the audio element. |
| </audio> |
|
|
| ### Male Voice |
| <audio controls> |
| <source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/generated/sample-2.mp3" type="audio/mpeg"> |
| Your browser does not support the audio element. |
| </audio> |
|
|
| --- |
|
|
| <div align="center"> |
| <b><em>Towards High-Quality Zero-Shot Singing Voice Synthesis</em></b> |
| <p> |
| <img src="assets/soulx-logo.png" alt="SoulX-Singer_Logo" style="height: 80px;"> |
| </p> |
| <p> |
| <a href="https://soul-ailab.github.io/soulx-singer/"><img src="https://img.shields.io/badge/Demo-Page-lightgrey" alt="version"></a> |
| <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src='https://img.shields.io/badge/Github-Page-green' alt="Github"></a> |
| <a href="https://arxiv.org/abs/2602.07803"><img src="https://img.shields.io/badge/arXiv-2602.07803-b31b1b" alt="arXiv"></a> |
| <a href="https://github.com/Soul-AILab/SoulX-Singer/blob/main/assets/technical-report.pdf"><img src='https://img.shields.io/badge/Report-Github?label=Technical&color=red' alt="technical report"></a> |
| <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache-2.0"></a> |
| </p> |
| </div> |
| |
| --- |
|
|
| ## Overview |
|
|
| **SoulX-Singer** is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression. |
|
|
| For more details, please refer to the paper: [SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis](https://arxiv.org/abs/2602.07803). |
|
|
|
|
| --- |
|
|
| ## Features |
|
|
| - **Zero-shot synthesis**: Generate singing voices for unseen singers without fine-tuning |
| - **Melody-conditioned control**: Use F0 contour for pitch guidance |
| - **Score-conditioned control**: Use MIDI notes for precise musical notation |
| - **High-fidelity output**: Realistic vocal synthesis with natural expression |
| - **Safetensors format**: Optimized model weights in bf16 + fp32 precision |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you use SoulX-Singer in your research, please cite: |
|
|
| ```bibtex |
| @article{soulxsinger2025, |
| title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis}, |
| author={Soul-AILab}, |
| journal={arXiv preprint arXiv:2602.07803}, |
| year={2025} |
| } |
| ``` |
|
|
| --- |
|
|
| ## License |
|
|
| This project is licensed under the Apache License 2.0. |