2 anni fa · ffe88a36a9
--- a/README.md
+++ b/README.md
@@ -597,6 +597,11 @@ Several quantization methods are supported. They differ in the resulting model d
 
				 |   13B | ms/tok @ 8th |      - |     73 |     82 |     98 |    105 |    128 |
			
 
				 |   13B | bits/weight  |   16.0 |    4.5 |    5.0 |    5.5 |    6.0 |    8.5 |
			
 
				 
			
 
				+- [k-quants](https://github.com/ggerganov/llama.cpp/pull/1684)
			
 
				+- recent k-quants improvements
			
 
				+  - [#2707](https://github.com/ggerganov/llama.cpp/pull/2707)
			
 
				+  - [#2807](https://github.com/ggerganov/llama.cpp/pull/2807)
			
 
				+
			
 
				 ### Perplexity (measuring model quality)
			
 
				 
			
 
				 You can use the `perplexity` example to measure perplexity over a given prompt (lower perplexity is better).
			
--- a/examples/perplexity/README.md
+++ b/examples/perplexity/README.md
@@ -1,3 +1,21 @@
 
				 # perplexity
			
 
				 
			
 
				 TODO
			
 
				+
			
 
				+## Llama 2 70B Scorechart
			
 
				+Quantization | Model size (GiB) | Perplexity | Delta to fp16
			
 
				+-- | -- | -- | --
			
 
				+Q4_0 | 36.20 | 3.5550 | 3.61%
			
 
				+Q4_1 | 40.20 | 3.5125 | 2.37%
			
 
				+Q5_0 | 44.20 | 3.4744 | 1.26%
			
 
				+Q2_K | 27.27 | 3.7339 | 8.82%
			
 
				+Q3_K_S | 27.86 | 3.7019 | 7.89%
			
 
				+Q3_K_M | 30.83 | 3.5932 | 4.72%
			
 
				+Q3_K_L | 33.67 | 3.5617 | 3.80%
			
 
				+Q4_K_S | 36.39 | 3.4852 | 1.57%
			
 
				+Q4_K_M | 38.54 | 3.4725 | 1.20%
			
 
				+Q5_K_S | 44.20 | 3.4483 | 0.50%
			
 
				+Q5_K_M | 45.41 | 3.4451 | 0.40%
			
 
				+Q6_K | 52.70 | 3.4367 | 0.16%
			
 
				+fp16 | 128.5 | 3.4313 | -
			
 
				+
			
--- a/examples/quantize/README.md
+++ b/examples/quantize/README.md
@@ -1,3 +1,44 @@
 
				 # quantize
			
 
				 
			
 
				 TODO
			
 
				+
			
 
				+## Llama 2 7B
			
 
				+
			
 
				+Quantization | Bits per Weight (BPW)
			
 
				+-- | --
			
 
				+Q2_K | 3.35
			
 
				+Q3_K_S | 3.50
			
 
				+Q3_K_M | 3.91
			
 
				+Q3_K_L | 4.27
			
 
				+Q4_K_S | 4.58
			
 
				+Q4_K_M | 4.84
			
 
				+Q5_K_S | 5.52
			
 
				+Q5_K_M | 5.68
			
 
				+Q6_K | 6.56
			
 
				+
			
 
				+## Llama 2 13B
			
 
				+Quantization | Bits per Weight (BPW)
			
 
				+-- | --
			
 
				+Q2_K | 3.34
			
 
				+Q3_K_S | 3.48
			
 
				+Q3_K_M | 3.89
			
 
				+Q3_K_L | 4.26
			
 
				+Q4_K_S | 4.56
			
 
				+Q4_K_M | 4.83
			
 
				+Q5_K_S | 5.51
			
 
				+Q5_K_M | 5.67
			
 
				+Q6_K | 6.56
			
 
				+
			
 
				+# Llama 2 70B
			
 
				+
			
 
				+Quantization | Bits per Weight (BPW)
			
 
				+-- | --
			
 
				+Q2_K | 3.40
			
 
				+Q3_K_S | 3.47
			
 
				+Q3_K_M | 3.85
			
 
				+Q3_K_L | 4.19
			
 
				+Q4_K_S | 4.53
			
 
				+Q4_K_M | 4.80
			
 
				+Q5_K_S | 5.50
			
 
				+Q5_K_M | 5.65
			
 
				+Q6_K | 6.56