Applying machine learning to the physical sciences can be challenging, since the data sets are typically small, with high dimensionality and high variance. Most machine learning methods were intended for large data sets with a small number of well-behaved features. To overcome this mis-match it is possible to increase the number of data points (instances) using “data augmentation” methods, or to decrease the number of features using “dimension reduction”. In this project you will test different numerical ways of expanding the size of a data set, and a range of dimension reduction methods to make a data set more amendable to machine learning. You will test and compare your results using some simple supervised machine learning models. All programming will be done in python, with packages available in scikit-learn. Data sets will be provided.
Python programming and an interest in data science and machine learning is essential.
data science, data augmentation, dimension reduction, materials, python, machine learning, data engineering