3

A Comprehensive Study on Post-Training Quantization for Large Language Models

Post-training quantization (PTQ) had been recently shown as a compromising method to reduce memory consumption and/or compute cost for large language models. However, a comprehensive study about the effect of different quantization schemes, different …

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Large-scale transformer models have become the de-facto architectures for various machine learning applications, e.g., CV and NLP. However, those large models also introduce prohibitive training costs. To mitigate this issue, we propose a novel …

SCOPE: C3SR Systems Characterization and Benchmarking Framework

This report presents the design of the Scope infrastructure for extensible and portable benchmarking. Improvements in high-performance computing systems rely on coordination across different levels of system abstraction. Developing and defining …