Search

Wen-mei Hwu

The Design and Implementation of a Scalable DL Benchmarking Platform (Best Paper Award)
DLSpec: A Deep Learning Task Exchange Specification
Benanza: Automatic μBenchmark Generation to Compute ''Lower-bound'' Latency and Inform Optimizations of Deep Learning Models on GPUs
XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs (Best Paper Award)
DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs
MLModelScope: Evaluate and Introspect Cognitive Pipelines
Accelerating Reduction and Scan Using Tensor Core Units
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function as a Service Environments
Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects (Best Paper Award)
Accelerating Reduction Using Tensor Core Units
SCOPE: C3SR Systems Characterization and Benchmarking Framework
RAI: A Scalable Project Submission System for Parallel Programming Courses
KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism

Published with Wowchemy — the free, open source website builder that empowers creators.