Kavli Affiliate: Feng Wang | First 5 Authors: Feng Wang, Zesheng Shi, Bo Wang, Nan Wang, Han Xiao | Summary: We present ReaderLM-v2, a compact 1.5 billion parameter language model designed for efficient web content extraction. Our model processes documents up to 512K tokens, transforming messy HTML into clean Markdown or JSON formats with high […]
Continue.. ReaderLM-v2: Small Language Model for HTML to Markdown and JSON