OFVL-MS: Once for Visual Localization across Multiple Indoor Scenes

Kavli Affiliate: Ke Wang

| First 5 Authors: Tao Xie, Kun Dai, Siyi Lu, Ke Wang, Zhiqiang Jiang

| Summary:

In this work, we seek to predict camera poses across scenes with a multi-task
learning manner, where we view the localization of each scene as a new task. We
propose OFVL-MS, a unified framework that dispenses with the traditional
practice of training a model for each individual scene and relieves gradient
conflict induced by optimizing multiple scenes collectively, enabling efficient
storage yet precise visual localization for all scenes. Technically, in the
forward pass of OFVL-MS, we design a layer-adaptive sharing policy with a
learnable score for each layer to automatically determine whether the layer is
shared or not. Such sharing policy empowers us to acquire task-shared
parameters for a reduction of storage cost and task-specific parameters for
learning scene-related features to alleviate gradient conflict. In the backward
pass of OFVL-MS, we introduce a gradient normalization algorithm that
homogenizes the gradient magnitude of the task-shared parameters so that all
tasks converge at the same pace. Furthermore, a sparse penalty loss is applied
on the learnable scores to facilitate parameter sharing for all tasks without
performance degradation. We conduct comprehensive experiments on multiple
benchmarks and our new released indoor dataset LIVL, showing that OFVL-MS
families significantly outperform the state-of-the-arts with fewer parameters.
We also verify that OFVL-MS can generalize to a new scene with much few
parameters while gaining superior localization performance.

| Search Query: ArXiv Query: search_query=au:”Ke Wang”&id_list=&start=0&max_results=3