The Australian National University
CECS Home | ANU Home | Search ANU | HORUS | Staff Home

Help | Seminars List | Add Seminar | Edit Seminars | Tips for organisers | RSS | ics Calendar | Search |

Send comments about this website to seminar-master@cecs.anu.edu.au


Contact: Michelle.Moravec@anu.edu.au

NICTA SEMINAR

A Stochastic Memoizer for Sequence Data

Dr Frank Wood (Gatsby Unit, University College London )


DATE: 2009-08-17
TIME: 11:00:00 - 12:00:00
LOCATION: NICTA - 7 London Circuit



ABSTRACT:
Note: Free Pizza Seminar. All welcome.

I will present a model for discrete sequence data called the sequence memoizer (SM). The sequence memoizer is a smoothing Markov model of unbounded order (think n-gram as n->infinity) that empirically has been shown to have the same computational complexity as a fifth order smoothing Markov model. The SM can be estimated from a single training sequence, yet shares statistical strength between subsequent symbol predictive distributions in such a way that predictive performance generalizes well. The model builds on a specific parameterization of an unbounded-depth hierarchical Pitman-Yor process. I will introduce analytic marginalization steps (using coagulation operators) to reduce this model to one that can be represented in time and space linear in the length of the training sequence. I show how to perform inference in such a model without truncation approximation and introduce fragmentation operators necessary to do predictive inference. I will demonstrate the SM as a language model, achieving state-of-the-art results.



BIO:
Frank Wood has a MSc and PhD (2007) in computer science from Brown University and a BS (1996) in the same from Cornell University. He most recently was employed as a postdoctoral research fellow at the Gatsby Computational Neuroscience Unit at the University College London and as a consultant for Stan James Plc, a Gibraltar-based sports bookmaker. His research focus is on the development of new statistical methods for machine learning, particularly so-called nonparametric Bayesian methods. Some of the practical benefits that stem from his research contributions include improved neural decoding technology for human neuroprosthetic devices and cutting-edge lossless compressors. Dr Wood is a former entrepreneur whose first company, ToFish! Inc, a content-based image retrieval software company, was acquired by America Online in 2000.