Dynamic DAG Discovery for Interpretable Imitation Learning – Kavli Institute Pre-Print Publications

Kavli Affiliate: Xiang Zhang

| First 5 Authors: ianxiang Zhao, Wenchao Yu, Suhang Wang, Lu Wang, Xiang Zhang

| Summary:

Imitation learning, which learns agent policy by mimicking expert
demonstration, has shown promising results in many applications such as medical
treatment regimes and self-driving vehicles. However, it remains a difficult
task to interpret control policies learned by the agent. Difficulties mainly
come from two aspects: 1) agents in imitation learning are usually implemented
as deep neural networks, which are black-box models and lack interpretability;
2) the latent causal mechanism behind agents’ decisions may vary along the
trajectory, rather than staying static throughout time steps. To increase
transparency and offer better interpretability of the neural agent, we propose
to expose its captured knowledge in the form of a directed acyclic causal
graph, with nodes being action and state variables and edges denoting the
causal relations behind predictions. Furthermore, we design this causal
discovery process to be state-dependent, enabling it to model the dynamics in
latent causal graphs. Concretely, we conduct causal discovery from the
perspective of Granger causality and propose a self-explainable imitation
learning framework, {method}. The proposed framework is composed of three
parts: a dynamic causal discovery module, a causality encoding module, and a
prediction module, and is trained in an end-to-end manner. After the model is
learned, we can obtain causal relations among states and action variables
behind its decisions, exposing policies learned by it. Experimental results on
both synthetic and real-world datasets demonstrate the effectiveness of the
proposed {method} in learning the dynamic causal graphs for understanding the
decision-making of imitation learning meanwhile maintaining high prediction
accuracy.

| Search Query: ArXiv Query: search_query=au:”Xiang Zhang”&id_list=&start=0&max_results=3