Kavli Affiliate: Wei Gao | First 5 Authors: Xuan Zhang, Fengzhuo Zhang, Cunxiao Du, Chao Du, Tianyu Pang | Summary: Scaling language models to handle longer contexts introduces substantial memory challenges due to the growing cost of key-value (KV) caches. Motivated by the efficiency gains of hybrid models and the broad availability of pretrained large […]
Continue.. LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation