Kavli Affiliate: Dan Luo | First 5 Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu | Summary: Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker’s voice without adaptation parameters. By quantizing speech waveform into discrete acoustic tokens and modeling these tokens with the language model, recent language model-based TTS models […]
Continue.. Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts