Johannes Gäßler
|
750f60c03e
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681)
|
1 rok temu |
Johannes Gäßler
|
9b596417af
CUDA: quantized KV support for FA vec (#7527)
|
1 rok temu |
Johannes Gäßler
|
133d99c599
CUDA: deduplicate FlashAttention code (#7352)
|
1 rok temu |
Johannes Gäßler
|
0fc1e820a9
CUDA: faster large batch FA without tensor cores (#7314)
|
1 rok temu |
Johannes Gäßler
|
dc685be466
CUDA: add FP32 FlashAttention vector kernel (#7188)
|
1 rok temu |
Georgi Gerganov
|
9cb317f77e
ggml : full ALiBi support (#7192)
|
1 rok temu |
Johannes Gäßler
|
a743d76a01
CUDA: generalize FP16 fattn vec kernel (#7061)
|
1 rok temu |
Johannes Gäßler
|
1613ef8d8e
CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019)
|
1 rok temu |
Georgi Gerganov
|
9c67c2773d
ggml : add Flash Attention (#5021)
|
1 rok temu |