How Does Diffusion Influence Pretrained Language Models on Out-of-Distribution Data?

Kavli Affiliate: Jing Wang

| First 5 Authors: Huazheng Wang, Daixuan Cheng, Haifeng Sun, Jingyu Wang, Qi Qi

| Summary:

Transformer-based pretrained language models (PLMs) have achieved great
success in modern NLP. An important advantage of PLMs is good
out-of-distribution (OOD) robustness. Recently, diffusion models have attracted
a lot of work to apply diffusion to PLMs. It remains under-explored how
diffusion influences PLMs on OOD data. The core of diffusion models is a
forward diffusion process which gradually applies Gaussian noise to inputs, and
a reverse denoising process which removes noise. The noised input
reconstruction is a fundamental ability of diffusion models. We directly
analyze OOD robustness by measuring the reconstruction loss, including testing
the abilities to reconstruct OOD data, and to detect OOD samples. Experiments
are conducted by analyzing different training parameters and data statistical
features on eight datasets. It shows that finetuning PLMs with diffusion
degrades the reconstruction ability on OOD data. The comparison also shows that
diffusion models can effectively detect OOD samples, achieving state-of-the-art
performance in most of the datasets with an absolute accuracy improvement up to
18%. These results indicate that diffusion reduces OOD robustness of PLMs.

| Search Query: ArXiv Query: search_query=au:”Jing Wang”&id_list=&start=0&max_results=10