ddh0 5b48cd53a8 Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 values (#8058)		1 ano atrás
..
CMakeLists.txt	1c641e6aac `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809)	1 ano atrás
README.md	ad52d5c259 doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288)	1 ano atrás
quantize.cpp	5b48cd53a8 Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 values (#8058)	1 ano atrás
tests.sh	1c641e6aac `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809)	1 ano atrás

quantize

You can also use the GGUF-my-repo space on Hugging Face to build your own quants without any setup.

Note: It is synced from llama.cpp main every 6 hours.

Llama 2 7B

Quantization	Bits per Weight (BPW)
Q2_K	3.35
Q3_K_S	3.50
Q3_K_M	3.91
Q3_K_L	4.27
Q4_K_S	4.58
Q4_K_M	4.84
Q5_K_S	5.52
Q5_K_M	5.68
Q6_K	6.56

Llama 2 13B

Quantization | Bits per Weight (BPW) -- | -- Q2_K | 3.34 Q3_K_S | 3.48 Q3_K_M | 3.89 Q3_K_L | 4.26 Q4_K_S | 4.56 Q4_K_M | 4.83 Q5_K_S | 5.51 Q5_K_M | 5.67 Q6_K | 6.56

Llama 2 70B

Quantization | Bits per Weight (BPW) -- | -- Q2_K | 3.40 Q3_K_S | 3.47 Q3_K_M | 3.85 Q3_K_L | 4.19 Q4_K_S | 4.53 Q4_K_M | 4.80 Q5_K_S | 5.50 Q5_K_M | 5.65 Q6_K | 6.56

README.md

quantize

Llama 2 7B

Llama 2 13B

Llama 2 70B