Localizing Task Information for Improved Model Merging and Compression

Kavli Affiliate: Ke Wang

| First 5 Authors: Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez, François Fleuret, Pascal Frossard

| Summary:

Model merging and task arithmetic have emerged as promising scalable
approaches to merge multiple single-task checkpoints to one multi-task model,
but their applicability is reduced by significant performance loss. Previous
works have linked these drops to interference in the weight space and erasure
of important task-specific features. Instead, in this work we show that the
information required to solve each task is still preserved after merging as
different tasks mostly use non-overlapping sets of weights. We propose
TALL-masks, a method to identify these task supports given a collection of task
vectors and show that one can retrieve >99% of the single task accuracy by
applying our masks to the multi-task vector, effectively compressing the
individual checkpoints. We study the statistics of intersections among
constructed masks and reveal the existence of selfish and catastrophic weights,
i.e., parameters that are important exclusively to one task and irrelevant to
all tasks but detrimental to multi-task fusion. For this reason, we propose
Consensus Merging, an algorithm that eliminates such weights and improves the
general performance of existing model merging approaches. Our experiments in
vision and NLP benchmarks with up to 20 tasks, show that Consensus Merging
consistently improves existing approaches. Furthermore, our proposed
compression scheme reduces storage from 57Gb to 8.2Gb while retaining 99.7% of
original performance.

| Search Query: ArXiv Query: search_query=au:”Ke Wang”&id_list=&start=0&max_results=3