Kavli Affiliate: Ke Wang | First 5 Authors: Hanshi Wang, Hanshi Wang, , , | Summary: The established redundancy in visual tokens within large vision-language models allows pruning to effectively reduce their substantial computational demands. Previous methods typically employ heuristic layer-specific pruning strategies where, although the number of tokens removed may differ across decoder layers, […]
Continue.. AutoPrune: Each Complexity Deserves a Pruning Policy