Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis

Kavli Affiliate: Jia Liu

| First 5 Authors: Junzhuo Li, Bo Wang, Xiuze Zhou, Peijie Jiang, Jia Liu

| Summary:

The interpretability of Mixture-of-Experts (MoE) models, especially those
with heterogeneous designs, remains underexplored. Existing attribution methods
for dense models fail to capture dynamic routing-expert interactions in sparse
MoE architectures. To address this issue, we propose a cross-level attribution
algorithm to analyze sparse MoE architectures (Qwen 1.5-MoE, OLMoE,
Mixtral-8x7B) against dense models (Qwen 1.5-7B, Llama-7B, Mistral-7B). Results
show MoE models achieve 37% higher per-layer efficiency via a "mid-activation,
late-amplification" pattern: early layers screen experts, while late layers
refine knowledge collaboratively. Ablation studies reveal a "basic-refinement"
framework–shared experts handle general tasks (entity recognition), while
routed experts specialize in domain-specific processing (geographic
attributes). Semantic-driven routing is evidenced by strong correlations
between attention heads and experts (r=0.68), enabling task-aware coordination.
Notably, architectural depth dictates robustness: deep Qwen 1.5-MoE mitigates
expert failures (e.g., 43% MRR drop in geographic tasks when blocking top-10
experts) through shared expert redundancy, whereas shallow OLMoE suffers severe
degradation (76% drop). Task sensitivity further guides design: core-sensitive
tasks (geography) require concentrated expertise, while distributed-tolerant
tasks (object attributes) leverage broader participation. These insights
advance MoE interpretability, offering principles to balance efficiency,
specialization, and robustness.

| Search Query: ArXiv Query: search_query=au:”Jia Liu”&id_list=&start=0&max_results=3