há 2 meses atrás · a8ca18b4b8
--- a/tools/llama-bench/README.md
+++ b/tools/llama-bench/README.md
@@ -82,6 +82,9 @@ Using the `-d <n>` option, each test can be run at a specified context depth, pr
 
				 
			
 
				 For a description of the other options, see the [main example](../main/README.md).
			
 
				 
			
 
				+> [!NOTE]
			
 
				+> The measurements with `llama-bench` do not include the times for tokenization and for sampling.
			
 
				+
			
 
				 ## Examples
			
 
				 
			
 
				 ### Text generation with different models
			
@@ -131,7 +134,7 @@ $ ./llama-bench -n 0 -n 16 -p 64 -t 1,2,4,8,16,32
 
				 | llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         16 | pp 64      |     33.52 ± 0.03 |
			
 
				 | llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         16 | tg 16      |     15.32 ± 0.05 |
			
 
				 | llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | pp 64      |     59.00 ± 1.11 |
			
 
				-| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | tg 16      |     16.41 ± 0.79 ||
			
 
				+| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | tg 16      |     16.41 ± 0.79 |
			
 
				 
			
 
				 ### Different numbers of layers offloaded to the GPU