Kavli Affiliate: Dan Luo | First 5 Authors: Chaoran Zhang, Lixin Zou, Dan Luo, Min Tang, Xiangyang Luo | Summary: In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide array of text-centric tasks. However, their `large’ scale introduces significant computational and storage challenges, particularly in managing the key-value states of […]
Continue.. Efficient Sparse Attention needs Adaptive Token Release