cturan/llama.cpp

mirror of https://github.com/cturan/llama.cpp

Author	SHA1 Message	Date
Georgi Gerganov	e84b71c2c6 ggml : drop support for QK_K=64 (#7473)	1 year ago
agray3	bc4bba364f Introduction of CUDA Graphs to LLama.cpp (#6766)	1 year ago
DAN™	e00b4a8f81 Fix more int overflow during quant (PPL/CUDA). (#6563)	1 year ago
slaren	0d56246f4b ggml : group all experts in a single ggml_mul_mat_id (#6505)	1 year ago
Carolinabanana	5dc9dd7152 llama : add Command R Plus support (#6491)	1 year ago
Kawrakow	55c1b2a3bb IQ1_M: 1.75 bpw quantization (#6302)	1 year ago
slaren	ae1f211ce2 cuda : refactor into multiple files (#6269)	1 year ago