Post-training quantization (PTQ) had been recently shown as a compromising method to reduce memory consumption and/or compute cost for large language models. However, a comprehensive study about the effect of different quantization schemes, different …
Large-scale transformer models have become the de-facto architectures for various machine learning applications, e.g., CV and NLP. However, those large models also introduce prohibitive training costs. To mitigate this issue, we propose a novel …
This report presents the design of the Scope infrastructure for extensible and portable benchmarking. Improvements in high-performance computing systems rely on coordination across different levels of system abstraction. Developing and defining …