Kavli Affiliate: Wei Gao | First 5 Authors: Xuan Zhang, Cunxiao Du, Chao Du, Tianyu Pang, Wei Gao | Summary: Recent advancements in large language models (LLMs) have extended their capabilities to handle long contexts. However, increasing the number of model layers and the length of input sequences significantly escalates the memory required to store […]
Continue.. SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction