Kavli Affiliate: Xiang Zhang | First 5 Authors: Min Liu, Min Liu, , , | Summary: Existing text-to-speech systems predominantly focus on single-sentence synthesis and lack adequate contextual modeling as well as fine-grained performance control capabilities for generating coherent multicast audiobooks. To address these limitations, we propose a context-aware and emotion controllable speech synthesis framework […]
Continue.. Audiobook-CC: Controllable Long-context Speech Generation for Multicast Audiobook