Function Approximation for Model Based Reinforcement Learning

Matthew Robards (Australian National University)


DATE: 2011-04-20
TIME: 11:30:00 - 12:00:00
LOCATION: Ian Ross Seminar Room, R214 with Pizza
CONTACT: JavaScript must be enabled to display this email address.

We introduce a novel online gradient-based reinforcement learning algorithm with function approximation for which we give theoretical guarantees. Our algorithm is model-based in the sense that we learn one function which predicts features of future states and another which predicts future rewards. We choose a value function approximator based on these two functions by formulating a generalized Bellman error objective that allows for general loss functions. Each of these functions are updated through stochastic gradient descent. We further give empirical comparison to other gradient-based online methods, namely residual gradient methods and GTD for optimal control. We find that we compare favorably to the competitors, in particular in noisy conditions. This performance is partly enabled by our ability to use non-square loss function with our algorithm - a novel contribution in itself. We found that epsilon-insensitive loss performed impressively.

Updated:  20 April 2011 / Responsible Officer:  JavaScript must be enabled to display this email address. / Page Contact:  JavaScript must be enabled to display this email address. / Powered by: Snorkel 1.4