New website coming soon
Please note that over the next month our website will be changing. Go to cecs.anu.edu.au/newsite for more information.
Function Approximation for Model Based Reinforcement Learning
Matthew Robards (Australian National University)CS HDR MONITORING AI Group
TIME: 11:30:00 - 12:00:00
LOCATION: Ian Ross Seminar Room, R214 with Pizza
We introduce a novel online gradient-based reinforcement learning algorithm with function approximation for which we give theoretical guarantees. Our algorithm is model-based in the sense that we learn one function which predicts features of future states and another which predicts future rewards. We choose a value function approximator based on these two functions by formulating a generalized Bellman error objective that allows for general loss functions. Each of these functions are updated through stochastic gradient descent. We further give empirical comparison to other gradient-based online methods, namely residual gradient methods and GTD for optimal control. We find that we compare favorably to the competitors, in particular in noisy conditions. This performance is partly enabled by our ability to use non-square loss function with our algorithm - a novel contribution in itself. We found that epsilon-insensitive loss performed impressively.