Kavli Affiliate: Zheng Zhu
| First 5 Authors: Kai Wang, Jianyang Gu, Daquan Zhou, Zheng Zhu, Wei Jiang
| Summary:
Dataset distillation reduces the network training cost by synthesizing small
and informative datasets from large-scale ones. Despite the success of the
recent dataset distillation algorithms, three drawbacks still limit their wider
application: i). the synthetic images perform poorly on large architectures;
ii). they need to be re-optimized when the distillation ratio changes; iii).
the limited diversity restricts the performance when the distillation ratio is
large. In this paper, we propose a novel distillation scheme to
textbf{D}istill information of large train sets textbf{i}nto generative
textbf{M}odels, named DiM. Specifically, DiM learns to use a generative model
to store the information of the target dataset. During the distillation phase,
we minimize the differences in logits predicted by a models pool between real
and generated images. At the deployment stage, the generative model synthesizes
various training samples from random noises on the fly. Due to the simple yet
effective designs, the trained DiM can be directly applied to different
distillation ratios and large architectures without extra cost. We validate the
proposed DiM across 4 datasets and achieve state-of-the-art results on all of
them. To the best of our knowledge, we are the first to achieve higher accuracy
on complex architectures than simple ones, such as 75.1% with ResNet-18 and
72.6% with ConvNet-3 on ten images per class of CIFAR-10. Besides, DiM
outperforms previous methods with 10% $sim$ 22% when images per class are 1
and 10 on the SVHN dataset.
| Search Query: ArXiv Query: search_query=au:”Zheng Zhu”&id_list=&start=0&max_results=3