On the Search for Feedback in Reinforcement Learning – Kavli Institute Pre-Print Publications

Kavli Affiliate: Ran Wang

| First 5 Authors: Ran Wang, Karthikeya S. Parunandi, Aayushman Sharma, Raman Goyal, Suman Chakravorty

| Summary:

The problem of Reinforcement Learning (RL) in an unknown nonlinear dynamical
system is equivalent to the search for an optimal feedback law utilizing the
simulations/ rollouts of the dynamical system. Most RL techniques search over a
complex global nonlinear feedback parametrization making them suffer from high
training times as well as variance. Instead, we advocate searching over a local
feedback representation consisting of an open-loop sequence, and an associated
optimal linear feedback law completely determined by the open-loop. We show
that this alternate approach results in highly efficient training, the answers
obtained are repeatable and hence reliable, and the resulting closed
performance is superior to global state-of-the-art RL techniques. Finally, if
we replan, whenever required, which is feasible due to the fast and reliable
local solution, it allows us to recover global optimality of the resulting
feedback law.

| Search Query: ArXiv Query: search_query=au:”Ran Wang”&id_list=&start=0&max_results=10