Today, many application domains, such as national census, healthcare, business analytic, fraud detection, and national security, require data to be integrated from multiple databases. Often, organizations are not willing or authorized to share the sensitive personal information in their databases with any other organization due to privacy and confidentiality regulations. The linkage of records across multiple databases held by different organizations is an emerging research discipline known as privacy-preserving record linkage (PPRL). PPRL facilitates the linkage of databases by ensuring the privacy of the individuals in these databases is protected.
In a multidatabase (MD) context, PPRL is significantly challenged by the intrinsic exponential growth in the number of potential record pair comparisons to be conducted. Blocking is commonly used in PPRL to make the linkage of large databases more scalable. The aim of blocking is to remove those record pairs that correspond to non-matches (refer to different entities). Various blocking techniques have been proposed to scale the PPRL process for two databases. However, many of these techniques are not suitable for blocking multiple databases. This limitation creates a need to develop blocking technique for the MD linkage context as real-world applications increasingly require records from more than two databases to be linked.
This research is the first to perform an extensive study on private blocking for multidatabase privacy-preserved record linkage (MD-PPRL). In this research I considered several challenges related to the blocking step of MD-PPRL: (1) how to perform scalable and efficient blocking with an increasing number of databases and their sizes, (2) how to identify records that need to be compared for different subgroups across all the databases that need to be linked, (3) how to remove redundant record comparisons effectively without scarifying the effectiveness of the linkage, and (4) how to combine different techniques to create an efficient and effective blocking workflow for a MD linkage. In this seminar, I will describe the contributions that address each challenge above and will conclude with a perspective regarding the future of MD-PPRL.