Kavli Affiliate: James J. Bock | First 5 Authors: Guangxuan Xiao, Guangxuan Xiao, , , | Summary: Mixture of Block Attention (MoBA) (Lu et al., 2025) is a promising building block for efficiently processing long contexts in LLMs by enabling queries to sparsely attend to a small subset of key-value blocks, drastically reducing computational cost. […]
Continue.. Optimizing Mixture of Block Attention