2 年之前 · 25d43e0eb5
--- a/Makefile
+++ b/Makefile
@@ -253,11 +253,6 @@ ifdef LLAMA_CUDA_KQUANTS_ITER
 
				 else
			
 
				 	NVCCFLAGS += -DK_QUANTS_PER_ITERATION=2
			
 
				 endif
			
 
				-ifdef LLAMA_CUDA_MMQ_Y
			
 
				-	NVCCFLAGS += -DGGML_CUDA_MMQ_Y=$(LLAMA_CUDA_MMQ_Y)
			
 
				-else
			
 
				-	NVCCFLAGS += -DGGML_CUDA_MMQ_Y=64
			
 
				-endif # LLAMA_CUDA_MMQ_Y
			
 
				 #ifdef LLAMA_CUDA_CUBLAS
			
 
				 #	NVCCFLAGS += -DGGML_CUDA_CUBLAS
			
 
				 #endif # LLAMA_CUDA_CUBLAS
			
--- a/README.md
+++ b/README.md
@@ -406,7 +406,6 @@ Building the program with BLAS support may lead to some performance improvements
 
				 --->
			
 
				   | Option                  | Legal values           | Default | Description |
			
 
				   |-------------------------|------------------------|---------|-------------|
			
 
				-  | LLAMA_CUDA_MMQ_Y        | Positive integer >= 32 |      64 | Tile size in y direction when using the custom CUDA kernels for prompt processing. Higher values can be faster depending on the amount of shared memory available. Power of 2 heavily recommended. |
			
 
				   | LLAMA_CUDA_FORCE_DMMV   | Boolean                |   false | Force the use of dequantization + matrix vector multiplication kernels instead of using kernels that do matrix vector multiplication on quantized data. By default the decision is made based on compute capability (MMVQ for 6.1/Pascal/GTX 1000 or higher). Does not affect k-quants. |
			
 
				   | LLAMA_CUDA_DMMV_X       | Positive integer >= 32 |      32 | Number of values in x direction processed by the CUDA dequantization + matrix vector multiplication kernel per iteration. Increasing this value can improve performance on fast GPUs. Power of 2 heavily recommended. Does not affect k-quants. |
			
 
				   | LLAMA_CUDA_MMV_Y        | Positive integer       |       1 | Block size in y direction for the CUDA mul mat vec kernels. Increasing this value can improve performance on fast GPUs. Power of 2 recommended. Does not affect k-quants. |
			
--- a/ggml-cuda.cu
+++ b/ggml-cuda.cu