Kavli Affiliate: Xiang Zhang
| First 5 Authors: Tianxiang Zhao, Wenchao Yu, Suhang Wang, Lu Wang, Xiang Zhang
| Summary:
Imitation learning has achieved great success in many sequential
decision-making tasks, in which a neural agent is learned by imitating
collected human demonstrations. However, existing algorithms typically require
a large number of high-quality demonstrations that are difficult and expensive
to collect. Usually, a trade-off needs to be made between demonstration quality
and quantity in practice. Targeting this problem, in this work we consider the
imitation of sub-optimal demonstrations, with both a small clean demonstration
set and a large noisy set. Some pioneering works have been proposed, but they
suffer from many limitations, e.g., assuming a demonstration to be of the same
optimality throughout time steps and failing to provide any interpretation
w.r.t knowledge learned from the noisy set. Addressing these problems, we
propose {method} by evaluating and imitating at the sub-demonstration level,
encoding action primitives of varying quality into different skills.
Concretely, {method} consists of a high-level controller to discover skills
and a skill-conditioned module to capture action-taking policies, and is
trained following a two-phase pipeline by first discovering skills with all
demonstrations and then adapting the controller to only the clean set. A
mutual-information-based regularization and a dynamic sub-demonstration
optimality estimator are designed to promote disentanglement in the skill
space. Extensive experiments are conducted over two gym environments and a
real-world healthcare dataset to demonstrate the superiority of {method} in
learning from sub-optimal demonstrations and its improved interpretability by
examining learned skills.
| Search Query: [#feed_custom_title]