Johannes Gäßler
|
42b53d192f
CUDA: revise q8_1 data layout for mul_mat_q (#7824)
|
1 year ago |
Johannes Gäßler
|
7d1a378b8f
CUDA: refactor mmq, dmmv, mmvq (#7716)
|
1 year ago |
Johannes Gäßler
|
9b596417af
CUDA: quantized KV support for FA vec (#7527)
|
1 year ago |
Georgi Gerganov
|
e84b71c2c6
ggml : drop support for QK_K=64 (#7473)
|
1 year ago |
Johannes Gäßler
|
fcf6538ba6
CUDA: fix unused warning in mmq.cu (#7442)
|
1 year ago |
Johannes Gäßler
|
d8ee902227
CUDA: deduplicate mmq code (#7397)
|
1 year ago |
agray3
|
bc4bba364f
Introduction of CUDA Graphs to LLama.cpp (#6766)
|
1 year ago |
slaren
|
ae1f211ce2
cuda : refactor into multiple files (#6269)
|
1 year ago |