GPU

Benanza

Automatic μBenchmark Generation to Compute “Lower-bound” Latency and Inform Optimizations of Deep Learning Models on GPUs.

MLModelScope

An open-source, framework and hardware agnostic, extensible and customizable, distributed platform design for evaluating and profiling ML models across datasets/frameworks/systems.

TOPS

Leveraging NVIDIA’s Tensor Cores to express Collectives with matrix multiplication and exploring the benefits in terms of program simplicity, efficiency, and performance.

Scope

An extendable and customizable GPU benchmarking framework

RAI

A Scalable Project Submission System for Parallel Programming Courses.

KLAP

Kernel Lauch Aggregation and Promotion (KLAP), a set of compiler techniques that improve the performance of GPU kernels which use dynamic parallelism.

TrIMS

Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function as a Service Environments.