Computational Automation of Biomarker Discovery with Flow Cytometry Data and Machine Learning

External Member

Dr Dan Andrews, Jubilee Joint Fellow,


The search for biological indicators of disease progression and response to clinical intervention is at the forefront of biomedical research. Flow cytometry gaining popularity as way to gain huge amounts of information about the types and numbers of cells present in humans with disease.  A primary use of this data is to identify potential disease biomarkers.  Exploration of the most informative biomarker candidates comes from custom built kits that are run on sophisticated cytometry machines (that detect proteins with laser fluorescence) and allow high-throughput assay of up to 250 additional protein markers at a time. The results are a large, complex dataset that can be difficult to decipher: many of the >250 potential biomarkers are variably correlated, may be co-expressed or not associated with clinical outcomes. The goal of this project is to use sophisticated machine learning techniques to identify additional protein markers to add to routine panels to improve our ability to predict clinical outcomes.



·         Develop an open-source R (or Python) software package to automate selection of potential biomarkers from screening data

·         Exploratory data analysis to identify redundant markers: correlation analysis, dimensionality reduction, training machine learning models per candidate biomarker

·         Use machine learning techniques to guide integration of this data into a single dataset.

·         Assess integration quality by comparison to bespoke, laboratory data generated data


Python or R programming and experience in data science and/or machine learning is required.  Experience with or interest in biological datasets and biological questions is essential.   

Potentially, both an Honours and a PhD scholarship are available for exceptional candidates.

In the first instance, please make contact with to discuss scope and potential for developing a project tailored to your interests and intended trajectory.

Co-supervisors: Ben Quah and Dillon Hammill (JCSMR, ANU)

Background Literature

Frank, R., Hargreaves, R. Clinical biomarkers in drug discovery and development. Nat Rev Drug Discov 2, 566–580 (2003).


·         Experience with real biological data and potential to develop an open-source analysis tool in a fast-moving scientific area

·         Coding experience in R or Python for feature selection and data integration

·         Experience applying machine learning techniques to scientific datasets


Machine learning, Data science, Bioinformatics

Updated:  10 August 2021/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing