The Australian National University
CECS Home | ANU Home | Search ANU | HORUS | Staff Home
Back to Satomgi _modules

SATOMGI


Module J: Data Mining and Matching

Lecturer

Module Description

Data mining is data analysis performed on very large databases with an emphasis on identifying and extracting novel, potentially useful, and understandable patterns and associations. Data mining is a multi-disciplinary field which uses a combination of machine learning, statistical analysis, modelling techniques, visualisation and database technology.

In many cases information from several data sources needs to be matched, linked and aggregated in order to allow more detailed data analysis or mining. Similarly, detecting and removing duplicate records that relate to the same entity within one data set is of importance, as data quality affects any subsequent analysis or mining. The aim of such linkages is to match and aggregate all records relating to the same entity.

Learning outcomes

On completion of this module, participants should have gained a understanding of the basic concepts and techniques used in data mining and data matching, including:
  • the data mining process, how data mining is defined, application areas, disciplines involved, and the major challenges in data mining;
  • data issues relevant to data mining (size, complexity, types and formats),
  • data warehousing, data cleaning and pre-processing;
  • unsupervised learning techniques like cluster analysis and association rules mining (including the basic methods used);
  • supervised learning techniques (classification and prediction), including the basics of decision tree induction and how to measure classifier accuracy;
  • schema integration, data matching (deterministic and probabilistic linkage), the importance of data cleaning, deduplication and geocoding.

Assumed knowledge

  • Basic understanding of spreadsheets and databases (tables, attributes, records).
  • Knowledge of working with windows based computer systems.

Schedule

Courses will be scheduled according to demand. Please contact Debbie Pioch to discuss.

More Information

Debbie Pioch
T: +61 2 6125 8020
E: debbie.pioch@anu.edu.au