Kavli Affiliate: Long Zhang
| First 5 Authors: Yang Sun, Yang Sun, , ,
| Summary:
Retrieval-augmented generation (RAG) incorporates external knowledge into
large language models (LLMs), improving their adaptability to downstream tasks
and enabling information updates. Surprisingly, recent empirical evidence
demonstrates that injecting noise into retrieved relevant documents
paradoxically facilitates exploitation of external knowledge and improves
generation quality. Although counterintuitive and challenging to apply in
practice, this phenomenon enables granular control and rigorous analysis of how
LLMs integrate external knowledge. Therefore, in this paper, we intervene on
noise injection and establish a layer-specific functional demarcation within
the LLM: shallow layers specialize in local context modeling, intermediate
layers focus on integrating long-range external factual knowledge, and deeper
layers primarily rely on parametric internal knowledge. Building on this
insight, we propose Layer Fused Decoding (LFD), a simple decoding strategy that
directly combines representations from an intermediate layer with final-layer
decoding outputs to fully exploit the external factual knowledge. To identify
the optimal intermediate layer, we introduce an internal knowledge score (IKS)
criterion that selects the layer with the lowest IKS value in the latter half
of layers. Experimental results across multiple benchmarks demonstrate that LFD
helps RAG systems more effectively surface retrieved context knowledge with
minimal cost.
| Search Query: ArXiv Query: search_query=au:”Long Zhang”&id_list=&start=0&max_results=3