Kavli Affiliate: Ke Wang | First 5 Authors: Zhangxuan Gu, Changhua Meng, Ke Wang, Jun Lan, Weiqiang Wang | Summary: Recently, various multimodal networks for Visually-Rich Document Understanding(VRDU) have been proposed, showing the promotion of transformers by integrating visual and layout information with the text embeddings. However, most existing approaches utilize the position embeddings to […]
Continue.. XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding