Kavli Affiliate: Wei Gao | First 5 Authors: Wei Gao, Xinyu Zhou, Peng Sun, Tianwei Zhang, Yonggang Wen | Summary: Key-Value cache (texttt{KV} texttt{cache}) compression has emerged as a promising technique to optimize Large Language Model (LLM) serving. It primarily decreases the memory consumption of texttt{KV} texttt{cache} to reduce the computation cost. Despite the development […]
Continue.. Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving