Developing a Data Matching Markup Language (DMML)


Research areas


Markup languages based on XML have been developed in many areas. In data mining, for example, the "Predictive Model Markup Language" (PMML, see link below) has recently gained interest from both commercial data mining software providers as well as the data mining research community (for example through the PMML module of the R statistical language). Data matching, also known as entity resolution or record linkage, is the process of identifying which records in two databases refer to the same real-world entity. So far, no overall framework for data matching has been developed that allows the different steps of the data matching process to be specified in an implementation independent way, for example using an XML based markup language.


The objective of this project is to analyse the requirements of all steps of the data matching process, and to develop an XML based markup language for data matching (possibly named the "Data Matching Markup language").


Having knowledge about XML is essential, while having attended a course on databases and/or data mining is highly desirable.

Background Literature

For papers about data matching please consult Peter's publication page given below.


A student working on this project will become familiar with many aspects of data matching and will gain in-depth knowledge about markup languages and how to specify them.

Updated:  1 January 2018/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing