Justine Tunney 3855416027 ggml : introduce bfloat16 support (#6412) 1 tahun lalu
..
CMakeLists.txt 0c4d489e29 quantize: add imatrix and dataset metadata in GGUF (#6658) 1 tahun lalu
README.md 5c4d767ac0 chore: Fix markdown warnings (#6625) 1 tahun lalu
quantize.cpp 3855416027 ggml : introduce bfloat16 support (#6412) 1 tahun lalu
tests.sh aa750c1ede tests : minor bash stuff (#6902) 1 tahun lalu

README.md

quantize

TODO

Llama 2 7B

Quantization Bits per Weight (BPW)
Q2_K 3.35
Q3_K_S 3.50
Q3_K_M 3.91
Q3_K_L 4.27
Q4_K_S 4.58
Q4_K_M 4.84
Q5_K_S 5.52
Q5_K_M 5.68
Q6_K 6.56

Llama 2 13B

Quantization | Bits per Weight (BPW) -- | -- Q2_K | 3.34 Q3_K_S | 3.48 Q3_K_M | 3.89 Q3_K_L | 4.26 Q4_K_S | 4.56 Q4_K_M | 4.83 Q5_K_S | 5.51 Q5_K_M | 5.67 Q6_K | 6.56

Llama 2 70B

Quantization | Bits per Weight (BPW) -- | -- Q2_K | 3.40 Q3_K_S | 3.47 Q3_K_M | 3.85 Q3_K_L | 4.19 Q4_K_S | 4.53 Q4_K_M | 4.80 Q5_K_S | 5.50 Q5_K_M | 5.65 Q6_K | 6.56