Kavli Affiliate: Zhuo Li | First 5 Authors: Jiajun Cao, Jiajun Cao, , , | Summary: Vision-Language-Action (VLA) models have demonstrated significant potential in complex scene understanding and action reasoning, leading to their increasing adoption in end-to-end autonomous driving systems. However, the long visual tokens of VLA models greatly increase computational costs. Current visual token […]
Continue.. FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning