Machine Learning-Based Performance Prediction of Big Data Applications on Future Hardware




Building new hardware, and especially a next-generation microprocessor, is costly. Hardware manufacturers such as Intel and AMD rely on detailed architectural simulation to estimate the performance of yet to built hardware. Unfortunately, simulation is extremely slow. On top of that, emerging Big Data workloads that process large amounts of data further slow down the simulation. 

An alternative to simulation is the use of analytical modeling. Unfortunately, the complexity and scale of modern processors severely limit the use of analytical models. In this project, we will explore machine learning approaches to predict the performance of Big Data applications on unseen future hardware. We first profile applications to collect an interesting feature set using performance counters on existing processors. We then explore different machine learning models to predict the performance of applications on unseen hardware. Our primary focus is to understand how the performance of Big Data applications scales as we build new generations of more powerful machines.


The main goal of this project is to predict the performance of Big Data applications on non-existing hardware. We expect the student to perform the following tasks:

(1) Choose a representative set of Big Data applications and learn to execute the chosen applications on several existing machines with Intel and AMD processors;
(2) Collect some interesting features of each application using performance counter hardware;
(3) Build a machine learning model to predict the performance of applications on future hardware with increasingly more cores and powerful processors;
(4) Demonstrate the prediction accuracy of the newly built model.


The nature of this project is very rewarding. The student will gain experience in setting up modern Big Data workloads. The student will also learn experimental design and performance modeling and evaluation.


Big Data applications

Machine learning

Scalabale computing

Paralell programming



Updated:  1 June 2019/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing