A Scalable Blocking Framework for Multidatabase Privacy-Preserving Record Linkage

Today, many application domains, such as national census, healthcare, business analytic, fraud detection, and national security, require data to be integrated from multiple databases. Often, organizations are not willing or authorized to share the sensitive personal information in their databases with any other organization due to privacy and confidentiality regulations. The linkage of records across multiple databases held by different organizations is an emerging research discipline known as privacy-preserving record linkage (PPRL). PPRL facilitates the linkage of databases by ensuring the privacy of the individuals in these databases is protected.
 
In a multidatabase (MD) context, PPRL is significantly challenged by the intrinsic exponential growth in the number of potential record pair comparisons to be conducted. Blocking is commonly used in PPRL to make the linkage of large databases more scalable. The aim of blocking is to remove those record pairs that correspond to non-matches (refer to different entities). Various blocking techniques have been proposed to scale the PPRL process for two databases. However, many of these techniques are not suitable for blocking multiple databases. This limitation creates a need to develop blocking technique for the MD linkage context as real-world applications increasingly require records from more than two databases to be linked.
 
This research is the first to perform an extensive study on private blocking for multidatabase privacy-preserved record linkage (MD-PPRL). In this research I considered several challenges related to the blocking step of MD-PPRL: (1) how to perform scalable and efficient blocking with an increasing number of databases and their sizes, (2) how to identify records that need to be compared for different subgroups across all the databases that need to be linked, (3) how to remove redundant record comparisons effectively without scarifying the effectiveness of the linkage, and (4) how to combine different techniques to create an efficient and effective blocking workflow for a MD linkage. In this seminar, I will describe the contributions that address each challenge above and will conclude with a perspective regarding the future of MD-PPRL.

Biography

Thilina Ranbaduge is a PhD student at the Australian National University (ANU) Research School of Computer Science. He is working on privacy-preserving record linkage techniques (PPRL) for multiple databases. His main focus is to develop scalable techniques for efficient and effective blocking of multiple databases. Before starting his PhD in 2014, he received his PG.Dip and BSc Honours from the University of Moratuwa, Sri Lanka, in 2013 and 2009, respectively.

Date & time

11am–12pm 11 Aug 2017

Location

Room:N224 Computer Systems seminar space

Internal speakers

Dr Thilina Ranbaduge

Contacts

Updated:  10 August 2021/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing