cturan/llama.cpp

镜像自地址 https://github.com/cturan/llama.cpp

作者	SHA1 备注	提交日期
Johannes Gäßler	76d66ee0be CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)	1 年之前
Johannes Gäßler	133d99c599 CUDA: deduplicate FlashAttention code (#7352)	1 年之前
Georgi Gerganov	9cb317f77e ggml : full ALiBi support (#7192)	1 年之前
Georgi Gerganov	9c67c2773d ggml : add Flash Attention (#5021)	1 年之前
DAN™	e00b4a8f81 Fix more int overflow during quant (PPL/CUDA). (#6563)	1 年之前
slaren	ae1f211ce2 cuda : refactor into multiple files (#6269)	1 年之前