Georgi Gerganov
|
e84b71c2c6
ggml : drop support for QK_K=64 (#7473)
|
1 year ago |
agray3
|
bc4bba364f
Introduction of CUDA Graphs to LLama.cpp (#6766)
|
1 year ago |
DAN™
|
e00b4a8f81
Fix more int overflow during quant (PPL/CUDA). (#6563)
|
1 year ago |
slaren
|
0d56246f4b
ggml : group all experts in a single ggml_mul_mat_id (#6505)
|
1 year ago |
Carolinabanana
|
5dc9dd7152
llama : add Command R Plus support (#6491)
|
1 year ago |
Kawrakow
|
55c1b2a3bb
IQ1_M: 1.75 bpw quantization (#6302)
|
1 year ago |
slaren
|
ae1f211ce2
cuda : refactor into multiple files (#6269)
|
1 year ago |