Johannes Gäßler
|
1f0dabda8d
CUDA: use tensor cores for MMQ (#7676)
|
il y a 1 an |
Johannes Gäßler
|
7d1a378b8f
CUDA: refactor mmq, dmmv, mmvq (#7716)
|
il y a 1 an |
Djip007
|
852aafb163
update HIP_UMA #7399 (#7414)
|
il y a 1 an |
Johannes Gäßler
|
133d99c599
CUDA: deduplicate FlashAttention code (#7352)
|
il y a 1 an |
Engininja2
|
d233b507cd
cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263)
|
il y a 1 an |
Johannes Gäßler
|
dc685be466
CUDA: add FP32 FlashAttention vector kernel (#7188)
|
il y a 1 an |
Johannes Gäßler
|
a743d76a01
CUDA: generalize FP16 fattn vec kernel (#7061)
|
il y a 1 an |
agray3
|
bc4bba364f
Introduction of CUDA Graphs to LLama.cpp (#6766)
|
il y a 1 an |
Johannes Gäßler
|
1613ef8d8e
CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019)
|
il y a 1 an |
Georgi Gerganov
|
9c67c2773d
ggml : add Flash Attention (#5021)
|
il y a 1 an |
Carolinabanana
|
5dc9dd7152
llama : add Command R Plus support (#6491)
|
il y a 1 an |
Georgi Gerganov
|
d48ccf3ad4
sync : ggml (#6351)
|
il y a 1 an |
slaren
|
ae1f211ce2
cuda : refactor into multiple files (#6269)
|
il y a 1 an |