Daniel Alarcon, PhD student from the Research School of Computer Science has just been awarded the best paper at the recent Australasian Computer Science Conference (ACSC) for his research which uses logic techniques to clean real-world census databases using SMT.
Databases typically contain many cross-field constraints which must be validated for each entry. Census databases can contain tens of millions of such entries. Finding records which fail the constraints without suggesting a correction is set inclusion. Error-localisation coupled with imputation is the task of finding a minimal correction to a failing record so that it satisfies all constraints. The error-localisation problem alone is intractable since it is known to be NP-complete.
The research by Daniel Alarcon and Professor Rajeev Gore titled Efficient Error Localisation and Imputation for Real-World Census Databases Using SMT used 25 standard desktops over a 24 hour period to clean 23 million person records. Daniel and Rajeev describe how they tailered Microsoft Z3 to handle error localisation and imputation.
Their experiments show that the phototype could handle realistic senarios from the Australian Bureau of Statistics (ABS) national sensus data in less than a 24 hour period, thus efficient error localisation and im[utation for census data is now fesilbe using state-of-the-art SAT/SMT solvers.