Kavli Affiliate: Feng Wang | First 5 Authors: Yiding Sun, Feng Wang, Yutao Zhu, Wayne Xin Zhao, Jiaxin Mao | Summary: The ability of the foundation models heavily relies on large-scale, diverse, and high-quality pretraining data. In order to improve data quality, researchers and practitioners often have to manually curate datasets from difference sources and […]
Continue.. An Integrated Data Processing Framework for Pretraining Foundation Models