YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
The official implementation of EmoSphere-TTS (INTERSPEECH 2024)

|Demo page

Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee

Abstract

Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressive emotional speech by using a spherical emotion vector to control the emotional style and intensity of the synthetic speech. Without any human annotation, we use the arousal, valence, and dominance pseudo-labels to model the complex nature of emotion via a Cartesian-spherical transformation. Furthermore, we propose a dual conditional adversarial network to improve the quality of generated speech by reflecting the multi-aspect characteristics. The experimental results demonstrate the model’s ability to control emotional style and intensity with high-quality expressive speech.

240312_model_overview_1


Training Procedure

Environments

  • For binary dataset creation, we follow the pipeline from [NATSpeech].
pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # install force alignment tools

1. Preprocess data

sh preprocessing.sh

2. Training TTS module and Inference

sh train_run.sh

3. Pretrained checkpoints

Citation

@inproceedings{cho24_interspeech,
  title     = {EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech},
  author    = {Deok-Hyeon Cho and Hyung-Seok Oh and Seung-Bin Kim and Sang-Hoon Lee and Seong-Whan Lee},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {1810--1814},
  doi       = {10.21437/Interspeech.2024-398},
  issn      = {2958-1796},
}

Acknowledgements

Our codes are based on the following repos:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for FatimahEmadEldin/EmoSteer-TTS