cturan/llama.cpp

Autor	SHA1 Wiadomość	Data
Johannes Gäßler	750f60c03e CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681)	1 rok temu
Johannes Gäßler	9b596417af CUDA: quantized KV support for FA vec (#7527)	1 rok temu
Johannes Gäßler	133d99c599 CUDA: deduplicate FlashAttention code (#7352)	1 rok temu
Johannes Gäßler	0fc1e820a9 CUDA: faster large batch FA without tensor cores (#7314)	1 rok temu
Johannes Gäßler	dc685be466 CUDA: add FP32 FlashAttention vector kernel (#7188)	1 rok temu
Georgi Gerganov	9cb317f77e ggml : full ALiBi support (#7192)	1 rok temu
Johannes Gäßler	a743d76a01 CUDA: generalize FP16 fattn vec kernel (#7061)	1 rok temu
Johannes Gäßler	1613ef8d8e CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019)	1 rok temu
Georgi Gerganov	9c67c2773d ggml : add Flash Attention (#5021)	1 rok temu