I am senior software engineer at Databricks GenAI. My work has focused on understanding and optimizing inference/training of Deep Learning (DL) models, particularly on Transformers (LLMs). Before that, I was a senior researcher at Microsoft. At Microsoft, I worked on improving the performance/usability of transformer models in production (e.g. GitHub Copilot, DALL·E-2, etc.), building systematic profiling/optimization stacks for DL, and integrating SOTA system technologies into Microsoft DeepSpeed, an open-source DL optimization software suite that enables unprecedented scale and speed for training and inference.
I recently developed and open sourced llm-analysis: Latency and Memory Analysis of Transformer Models for Training and Inference, check it out!
PhD in Computer Science, 2020
University of Illinois Urbana-Champaign
MS in Computer Science and Engineering, 2015
University of Michigan
BS in Computer Engineering, 2013
University of Michigan
BS in Electrical Engineering, 2013
Shanghai Jiao Tong University
Python, C/C++, CUDA, Go, JavaScript, Bash
Chinese, English