Code for ``DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization''

1. Requirements:

python 3.10, pytorch >= 2.0
install pytorch with cuda from https://pytorch.org/get-started/locally/, it is prerequisite for fast-hadamard-transform package.

pip install -r requirement.txt

install fast-hadamard-transform

cd third-part
git clone https://github.com/Dao-AILab/fast-hadamard-transform.git
cd fast-hadamard-transform
pip install .

install lm-eval

git clone https://github.com/EleutherAI/lm-evaluation-harness.git
cd lm-evaluation-harness
pip install -e .

The ./fake_quant folder contains the code for fusing the calibrated rotation matrix and performing the quantization test. The usage is described in detail in the Readme.md file in the directory.
The ./calibrater folder contains the code for obtaining the calibration set and the calibration rotation matrix. The specific usage is described in the Readme.md in this directory.

./NPU_DartQuant folder contains contains NPU runtime code, and its usage is basically the same as that of the GPU version.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
NPU_DartQuant		NPU_DartQuant
calibrater		calibrater
fake_quant		fake_quant
README.md		README.md
requirement.txt		requirement.txt