Office of Technology Transfer – University of Michigan

Collaborative Single Kernel Execution Across Heterogeneous Devices

Technology #6360

In heterogeneous computing, a computer system is equipped with multiple types of processors, each designed to optimally execute specific types of tasks. Historically, central processing units (CPUs) handle non data-parallel work such as sequential code and data transfer management, while graphics processing units (GPUs) handle data-parallel work due to their massive number of cores. Unfortunately, this division of labor may result in 1) CPU performance being under-utilized, 2) energy waste during transfer of workloads between processors, and 3) unnecessary competition between CPU and GPU usage. The Single Kernel Multiple Devices (SKMD) system is a framework to manage collaborative execution of tasks across asymmetric CPUs and GPUs, enhancing computational performance.


In SKMD, a single data-parallel kernel is developed in OpenCL for the system to transparently manage usage of multiple CPUs and GPUs. A code transformation methodology distributes data and merges results in a seamless and efficient manner, regardless of data access pattern. An advanced partitioning algorithm, considering device performance and data transfer cost, balances workload across processors. SKMD algorithms speedup computational performance on average by 29% compared to single processor units.


  • Enhancing graphics performance.
  • Facilitating large data-set analysis.
  • Assisting context-aware computing.


  • Enables easier programming of GPUs.
  • Removes need for applications to launch multiple kernels.
  • Allows for collaboration between asymmetric processors.
  • Considers data transfer cost and device performance.
  • Reduces energy usage.
  • Improves portability of OpenCL/CUDA code.