Reinforcing Trial-based Heuristic Tree Search with Neural Networks



Research areas


Reasoning, planning and problem solving are all conceived to be parts of what is generally understood as intelligence. In the context of Artificial Intelligence, the related established research field is Automated Planning, where an agent has to reach a given goal by  applying a sequence of actions. If our actions have uncertain outcomes we talk about non-deterministic (or probabilistic) planning. The Trial-based Heuristic Tree Search framework (Keller and Helmert 2013) allows to model and express several state of the art algorithms commonly applied to probabilistic planning problems. The framework allows to mix several ingredients to form different types of algorithms but has not yet been considered for neural network based algorithms.
On the other hand, the AlphaGo Zero reinforcement learning approach (Silver et al. 2017, 2018) has had considerable success in a variety of games by combining deep neural networks with Monte Carlo Search, which can be seen as an instantiation of trial-based heuristic tree search.


In this project we aim to bridge the gap between THTS and deep learning by designing and implementing algorithms based on neural networks used in conjunction with trial-based heuristic tree search. For the implementation part we may extend the PROST planning system (Keller and Eyerich 2012) with an interface that allows to apply deep learning techniques to trial-based heuristic tree search.
The complexity of open problems in this project allows for both, short-term and more long-term oriented projects.


Proficiency with C++, strong programming skills, passion for artificial intelligence. Background in ML is a plus.

Background Literature

- Thomas Keller and Malte Helmert. Trial-based Heuristic Tree Search for Finite Horizon MDPs. ICAPS, 2013.
- Thomas Keller and Patrick Eyerich. PROST: Probabilistic Planning Based on UCT. ICAPS, 2012.
- David Silver and Julian Schrittwieser and Karen Simonyan and Ioannis Antonoglou and Aja Huang and Arthur Guez and Thomas Hubert and Lucas Baker and Matthew Lai and Adrian Bolton and Yutian Chen and Timothy Lillicrap and Fan Hui and Laurent Sifre and George van den Driessche and Thore Graepel and Demis Hassabis. Mastering the Game of Go without Human Knowledge. Nature, 2017.
- David Silver and Thomas Hubert and Julian Schrittwieser and Ioannis Antonoglou and Matthew Lai and Arthur Guez and Marc Lanctot and Laurent Sifre and Dharshan Kumaran and Thore Graepel and Timothy Lillicrap and Karen Simonyan and Demis Hassabis. Mastering Chess and Shogi by Self-Play with a General  Reinforcement Learning Algorithm


Research experience, knowledge of advanced AI topics, working on a state of the art AI system.


artificial intelligence, stochastic planning, alpha go, deep learning

Updated:  10 February 2019/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing