cturan/llama.cpp

mirror of https://github.com/cturan/llama.cpp

Author	SHA1 Message	Date
Johannes Gäßler	76d66ee0be CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)	1 year ago
slaren	08a0c02060 ggml : mul_mat_id use the same tensor for all the experts (#6387)	1 year ago
slaren	ae1f211ce2 cuda : refactor into multiple files (#6269)	1 year ago