Cheng Li

Senior Researcher

Microsoft

About Me

I am senior software engineer at Databricks GenAI. My work has focused on understanding and optimizing inference/training of Deep Learning (DL) models, particularly on Transformers (LLMs). Before that, I was a senior researcher at Microsoft. At Microsoft, I worked on improving the performance/usability of transformer models in production (e.g. GitHub Copilot, DALL·E-2, etc.), building systematic profiling/optimization stacks for DL, and integrating SOTA system technologies into Microsoft DeepSpeed, an open-source DL optimization software suite that enables unprecedented scale and speed for training and inference.

I recently developed and open sourced llm-analysis: Latency and Memory Analysis of Transformer Models for Training and Inference, check it out!

Interests

Deep Learning and Transformers (LLMs)
System Optimization and Engineering for Deep Learning
GPU and Parallel Computing

Education

PhD in Computer Science, 2020
University of Illinois Urbana-Champaign
MS in Computer Science and Engineering, 2015
University of Michigan
BS in Computer Engineering, 2013
University of Michigan
BS in Electrical Engineering, 2013
Shanghai Jiao Tong University

Experience

Senior Software Engineer

Databricks

Aug 2023 – Present Bellevue, WA

Senior Researcher

Microsoft

Aug 2020 – Aug 2023 Bellevue, WA

Research Intern

Alibaba Group

May 2019 – Aug 2019 Sunnyvale, CA

Teaching Assistant for the 9th Programming and Tuning Massively Parallel Systems + Artificial Intelligence summer school (PUMPS+AI)

BSC, UPC and UIUC

Jul 2018 – Jul 2018 Barcelona, Spain

Research Intern

IBM TJ Watson Research Center

May 2018 – Aug 2018 Yorktown Heights, NY

Research Intern

IBM TJ Watson Research Center

May 2017 – Aug 2017 Yorktown Heights, NY

Head Teaching Assistant for ECE408/CS483: Applied Parallel Programming

UIUC

Aug 2016 – Dec 2016 Champaign, IL

Publications

Quickly discover relevant content by filtering publications.

Zhewei Yao, Xiaoxia Wu, Cheng Li, Stephen Youn, Yuxiong He. ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation. In arXiv, 2023.

PDF Source Document

Zhewei Yao, Cheng Li, Xiaoxia Wu, Stephen Youn, Yuxiong He. A Comprehensive Study on Post-Training Quantization for Large Language Models. In arXiv, 2023.

Cheng Li, Xiaoxia Wu, Reza Yazdani Aminabadi, Zhewei Yao, Yuxiong He. Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases. ICML, 2023.

Syed Zawad, Cheng Li, Zhewei Yao, Elton Zheng, Yuxiong He, Feng Yan. DySR: Adaptive Super-Resolution via Algorithm and System Co-design. ICLR, 2023.

PDF Source Document

Zhewei Yao, Xiaoxia Wu, Conglong Li, Connor Holmes, Minjia Zhang, Cheng Li, Yuxiong He. Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers. In arXiv, 2022.

PDF Project Source Document

Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Yuxiong He. Deepspeed inference: Enabling efficient inference of transformer models at unprecedented scale. Super Computing, 2022.

PDF Project Source Document

Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu. The Design and Implementation of a Scalable DL Benchmarking Platform (Best Paper Award). IEEE CLOUD, 2020.

PDF Code Project Slides

Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu. DLSpec: A Deep Learning Task Exchange Specification. USENIX OpML, 2020.

PDF Project Slides Source Document

Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu. Benanza: Automatic μBenchmark Generation to Compute ''Lower-bound'' Latency and Inform Optimizations of Deep Learning Models on GPUs. IPDPS, 2020.

PDF Project Slides Source Document

Cheng Li, Abdul Dakkak, Jinjun Xiong, Wei Wei, Lingjie Xu, Wen-mei Hwu. XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs (Best Paper Award). IPDPS, 2020.

PDF Project Slides Source Document

See all publications

Talks & Posters

SC 2019 - Across-Stack Profiling and Characterization of State-of-the-Art Machine Learning Models on GPUs

Nov 18, 2019 3:30 PM Denver, CO

Tutorial at IISWC 2019 - Challenges and Solutions for End-to-End and Across Stack ML Benchmarking

Nov 3, 2019 3:30 PM Orlando, FL

HotChips 2019 - MLModelScope: Evaluate and Profile ML Models at Scale and Across Stack

Aug 19, 2019 3:30 PM Palo Alto, California

IEEE Services 2019 - MLModelScope: Evaluate and Introspect Cognitive Pipelines

Jul 10, 2019 3:30 PM Milan, Italy

Tutorial at ISCA 2019 - Benchmarking Deep Learning Systems

Jun 22, 2019 3:30 PM Phoenix, AZ

Languages

Python, C/C++, CUDA, Go, JavaScript, Bash

Chinese, English