Kavli Affiliate: Yi Zhou | First 5 Authors: Liwen Tan, Yin Cao, Yi Zhou, , | Summary: Modality discrepancies have perpetually posed significant challenges within the realm of Automated Audio Captioning (AAC) and across all multi-modal domains. Facilitating models in comprehending text information plays a pivotal role in establishing a seamless connection between the two […]
Continue.. EDTC: enhance depth of text comprehension in automated audio captioning