Diagnostic Data: Machine learning with cell abundance datasets for disease detection

External Member

Dr Dan Andrews, Jubilee Joint Fellow, Dan.Andrews@anu.edu.au


Application of genome data in a hospital setting can be transformative to the treatment of many patients and allows genetic diagnosis of their disease or condition.  Yet, many genetic diseases are complex and don’t arise due to a single mutation in a patient genome.  With the rapid growth of personal genomic datasets, sophisticated pattern-recognition data tools are required to identify the genetic causes of these complex diseases.  We will use high-throughput cellular data to better understand the changes that occur in human disease.  Projects will have the scope to expose students to the full range of data activities from data pre-processing, normalisation and dimensionality reduction, through to training of neural networks to use this cellular data to differentiate a normal individuals from a patient with a disease.  Project data will include that from genetically-identical mice and from hospital patients.  We also will investigate methods that may allow us to see what aspects of the data allow diagnoses.  Given that the dimensionality reduced cellular datasets are readily viewed as images, we will identify cellular populations that indicate the presence of disease.  


·         Programmatically normalise and manipulate large amounts of cytometry data to produce a uniform corpus of experimental data, to allow querying and generation of derived data.

·         Pursue deep learning approaches to predict disease/healthy status


Python or R programming and experience in data science and/or machine learning is required.  Experience with or interest in biological datasets and biological questions is essential. 

Potentially, both an Honours and a PhD scholarship are available for exceptional candidates.

In the first instance, please make contact with Dan.Andrews@anu.edu.au to discuss scope and potential for developing a project tailored to your interests and intended trajectory. 


·         Experience with real scientific data and the challenges involved in deriving meta-data and ensuring data consistency

·         Practical experience with computational pipelines, potentially in a high-performance computing environment

·         Potential to contribute to open-source software projects

·         Experience applying deep learning to scientific datasets


Machine learning, Deep learning, Data science, Bioinformatics, Python, R

Updated:  10 August 2021/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing