I am a senior software engineer at Databricks GenAI. My work has focused on optimizing training/inference of Deep Learning (DL) models, particularly on Large Language models (LLMs) and Large Multimodal Models (LMMs).
At Databricks, I have worked on building DBRX and optimizing its training performance (three months of training on 3072 H100 GPUs). I have aggressively optimized the memory usage/computation/communication to achieve SOTA training efficiency. Refer to Building DBRX-class Custom LLMs with Mosaic AI Training for more details. Currently I am optimizing Llama3 and DBRX inference performance.
Before joining Databricks, I was a senior researcher at Microsoft, where I worked on improving LLM/LMM performance/usability in production (GitHub Copilot, DALL·E2, etc.), creating SOTA AI system technologies and building up Microsoft DeepSpeed, an open-source library that enables unprecedented scale and speed for training and inference.
I developed and open sourced llm-analysis: Latency and Memory Analysis of Transformer Models for Training and Inference. It helps planning resources for training/inference and suggests optimization opportunities. Check it out!
PhD in Computer Science, 2020
University of Illinois Urbana-Champaign
MS in Computer Science and Engineering, 2015
University of Michigan
BS in Computer Engineering, 2013
University of Michigan
BS in Electrical Engineering, 2013
Shanghai Jiao Tong University