KLAP

Dynamic parallelism on GPUs simplifies the programming of many classes of applications that generate paral-lelizable work not known prior to execution. However, modern GPUs architectures do not support dynamic parallelism efficiently due to the high kernel launch overhead, limited number of simultaneous kernels, and limited depth of dynamic calls a device can support.

We proposed Kernel Lauch Aggregation and Promotion (KLAP), a set of compiler techniques that improve the performance of kernels which use dynamic parallelism. More details are in the paper.

Cheng Li
Cheng Li
Member of Technical Staff

I specialize in building efficient AI training and inference systems using GPUs, with a focus on optimizing performance for Large Language Models (LLMs) and Large Vision Models (LVMs).

Related