Datasets in the physical sciences (materials, chemistry, physics and nanotechnology) are often expensive to procure and developed on a case-by-case basis in response to individual needs. There are, however, some similarities in the scientific problem being addressed, the instruments used to generate the data, and/or the feature extraction methods used to characterise it. Human intuition allows researchers to use knowledge gained on one project to inform how they approach another, and it is highly desirable to extract more value from machine learning methods used in materials informatics, cheminformatics or nanoinformatics in the same way. How can the outputs and outcomes of one machine learning study inform the development of the next, and can we save time by transferring knowledge of the aspects that are shared between the two? Transfer learning is an area of machine learning that can enable this type capability, but has yet to be explored in this context. In general, transfer learning aims to achieve high performance by transferring knowledge from the source task, by focussing on sequential sharing knowledge/representations between targets, using domain adaptation, where the targets have different data sources (even if the feature attributes are the same). Information about the structure of a similar model trained on a different source is transferred, but the new model is still trained on its own data, which could allow researchers to save time and resources without sacrificing the individual needs of a given project, or adapting their data generation and characterisation unnecessarily (which would increase the cost). In this project you will explore, design and undertake transfer learning on a series of nanotechnology data sets to determine to want extent knowledge from models trained on one set is transferable to another, including data cleaning protocols, feature engineering and selection, hyper-parameter selection, model structure and any aspect of “pre-training”. Different unsupervised and supervised methods will be addressed, using tabular data, and appropriate methods for evaluation of “transferability” will be developed and tested. The project will be based on (and compatible with) standard scikit-learn, Pytorch, Tensorflow and Keras platforms, and thoroughly tested on an extensive selection of models to demonstrate utility.
To investigate the transferability of methods and models in the context of tabular data from the physical sciences, and develop a python module that can be used by other to undertake and evaluate the success of transfer learning on similar data.
Python programming and experience in data science and machine learning is essential (such as COMP3720, COMP4660, COMP4670, COMP6670, COMP8420). Familiarity with platforms such as scikit-learn, Pytorch, Tensorflow and Keras is desirable.
This is a 24cp project.
machine learning, materials informatics, transfer learning, nanotechnology