I am a Member of Technical Staff at Black Forest Labs, specializing in optimizing the training and inference efficiency of Large Language Models (LLMs) and Large Vision Models (LVMs).
Previously, I worked at Databricks Mosaic AI, where I played a key role in developing the DBRX model by optimizing memory utilization, computational efficiency, and communication strategies during training to achieve state-of-the-art performance (Building DBRX-class Custom LLMs with Mosaic AI Training). I collaborated with NVIDIA to resolve FP8 training challenges in TransformerEngine, enabling FP8 training for Mosaic AI models. Additionally, I led the technical effort to optimize the inference of Llama and DBRX models.
Prior to Databricks, I was part of Microsoft DeepSpeed, where I enhanced the performance and usability of LLMs in production systems such as GitHub Copilot and DALL·E2. My work included developing cutting-edge AI system technologies and scaling Microsoft DeepSpeed into a leading AI framework.
I created llm-analysis, an open-source tool for analyzing latency and memory in transformer models, helping with resource planning and optimization. Check it out!
PhD in Computer Science, 2020
University of Illinois Urbana-Champaign
MS in Computer Science and Engineering, 2015
University of Michigan
BS in Computer Engineering, 2013
University of Michigan
BS in Electrical Engineering, 2013
Shanghai Jiao Tong University