Using Synthetic Benchmarks to Improve OpenCL Performance Prediction




The Architecture Independent Workload Characterization (AIWC) tool [1] characterizes OpenCL kernels according to a set of architecture-independent features -- which count target characteristics and are collected during program execution in a simulator. The associated metrics are broadly divided into four classes:

  • parallelism: such as number of work items and load imbalance;
  • compute: such as the diversity of instructions;
  • memory: such as working memory footprint and entropy measurements which affect cache utilization; and
  • control: such as branch entropy.
AIWC features have been combined with performance measurements from the Extended OpenDwarfs Benchmark Suite [2] to train an accurate performance prediction model for a wide range of accelerator devices [3], however, this model may not be representative of the full range of OpenCL programs likely to be run on modern architectures.
CLgen [4] is an open source application for generating runnable programs using deep learning. Representative features of OpenCL programs are learned from large volumes of program fragments gathered from GitHub, generating a model that is capable of creating realistic novel many-core OpenCL programs. The tool is capable of generating ~100 synthesized kernels per second.


This project aims to improve the existing predictive model using AIWC features and execution time measurements from synthetic kernels generated using CLgen. The emphasis of this work is on identifying poor predictions from the model when using synthesized kernels. These outliers in the feature-space will then be included in the training set, thus improving the predictive model. Alternative machine-learning techniques may also be used to improve performance prediction on available data.

This work will benefit the HPC community by being the first benchmark suite that uses a data-driven methodology to augment conventional benchmarking by adding the synthetic outliers; we ensure the final benchmarked curation is thus fully representative of OpenCL codes found in the wild.



Background Literature

[1] Johnston, B. and Milthorpe, J. (2018) AIWC: OpenCL-Based Architecture-Independent Workload Characterization 
[2] Johnston, B. and Milthorpe, J. (2018) Dwarfs on Accelerators: Enhancing OpenCL Benchmarking for Heterogeneous Computing Architectures
[3] Johnston, B., Falzon, G. and Milthorpe, J. (2018) OpenCL Performance Prediction using Architecture-Independent Features
[4] Cummins, C., Petoumenos, P., Wang, Z. and Leather, H. (2017) Synthesizing benchmarks for predictive modeling



Updated:  10 August 2021/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing