Techniques for linking and integrating data from different sources are becoming increasingly important in many application areas, including health, census, taxation, immigration, social welfare, in crime and fraud detection, in the assembly of national security intelligence, for businesses, in bibliometrics, as well as in the social sciences.
In today's Big Data era, record linkage (also known as entity resolution, duplicate detection, or data matching) not only faces computational challenges due to the increasing size of data collections and their complexity, but also operational challenges as many applications move from static environments into real-time processing and analysis of potentially very large and dynamically changing data streams, where real-time linking of records is required. Additionally, with the growing concerns by the public of the use of their sensitive data, privacy and confidentiality often need to be considered when personal information is being linked and shared between organisations.
In this talk I will present a short introduction to record linkage, highlight recent developments in advanced record linkage techniques and methods - and discuss future research challenges and directions.
Peter Christen is a Professor at the Australian National University (ANU) Research School of Computer Science. He graduated with a PhD in Computer Science in 1999 from the University of Basel, Switzerland, and has been at the ANU since 2000. He has led various research projects, including industry collaborations with NSW Health, Google, and Fujitsu Laboratories. He has published over 130 articles in the areas of record linkage and data mining, including in 2012 the monograph "Data Matching" published by Springer. He is the principle developer of the Febrl (Freely Extensible Biomedical Record Linkage) open source data cleaning, deduplication and record linkage system.