Stephan Walter
|
36d19a603b
Remove Q4_3 which is no better than Q5 (#1218)
|
2 years ago |
Georgi Gerganov
|
574406dc7e
ggml : add Q5_0 and Q5_1 quantization (#1187)
|
2 years ago |
Georgi Gerganov
|
7a32fcb3b2
ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179)
|
2 years ago |
slaren
|
50cb666b8a
Improve cuBLAS performance by using a memory pool (#1094)
|
2 years ago |
slaren
|
2005469ea1
Add Q4_3 support to cuBLAS (#1086)
|
2 years ago |
slaren
|
02d6988121
Improve cuBLAS performance by dequantizing on the GPU (#1065)
|
2 years ago |