Kavli Affiliate: Zeeshan Ahmed | First 5 Authors: Niko Moritz, Ruiming Xie, Yashesh Gaur, Ke Li, Simone Merello | Summary: We propose the joint speech translation and recognition (JSTAR) model that leverages the fast-slow cascaded encoder architecture for simultaneous end-to-end automatic speech recognition (ASR) and speech translation (ST). The model is transducer-based and uses a […]
Continue.. Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition