Georgi Gerganov
|
ef797db357
metal : disable fast math in all quantize kernels (#14528)
|
6 months ago |
Georgi Gerganov
|
67d1ef23c6
batch : add optional for sequential equal split (#14511)
|
6 months ago |
Georgi Gerganov
|
7b50f7c025
graph : prepare for 4D mask (#14515)
|
6 months ago |
Georgi Gerganov
|
c79184d2d1
batch : add n_used count (#14512)
|
6 months ago |
luyhcsu
|
499a8f5a78
CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator (#14002)
|
6 months ago |
Sigbjørn Skjæret
|
28657a8229
ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445)
|
6 months ago |
lhez
|
bee28421be
opencl : broadcast for soft_max (#14510)
|
6 months ago |
Jeff Bolz
|
2b72bedec1
vulkan: support mixed/deepseekR1 FA head sizes (#14509)
|
6 months ago |
Johannes Gäßler
|
c8c4495b8d
ggml: backward pass for split swiglu (#14483)
|
6 months ago |
Nicolò Scipione
|
7b63a71a6b
Fix conditional enabling following arch checks for ggml-sycl (#14504)
|
6 months ago |
Xuan-Son Nguyen
|
0c2ee38ab7
convert : correct gemma 3n conversion (#14450)
|
6 months ago |
Georgi Gerganov
|
a70c8a0c4b
kv-cache : use ggml_set_rows (#14285)
|
6 months ago |
Georgi Gerganov
|
9067487c44
ggml : fix FA mask dim 2 and 3 (#14505)
|
6 months ago |
Georgi Gerganov
|
d4cdd9c1c3
ggml : remove kompute backend (#14501)
|
6 months ago |
Aman Gupta
|
55c2646b45
CUDA: add dynamic shared mem to softmax, refactor general usage (#14497)
|
6 months ago |
Sigbjørn Skjæret
|
e75ba4c043
gguf-py : add support for chat template jinja files (#14508)
|
6 months ago |
compilade
|
5d46babdc2
llama : initial Mamba-2 support (#9126)
|
6 months ago |
Georgi Gerganov
|
e17991c466
sync : ggml
|
6 months ago |
Daniel Bevenius
|
c46944aa25
ggml : add version function to get lib version (ggml/1286)
|
6 months ago |
Rotem Dan
|
f3ed38d793
Set RPATH to "@loader_path" / "$ORIGIN" to ensure executables and dynamic libraries search for dependencies in their origin directory. (#14309)
|
6 months ago |
Aman Gupta
|
55a1c5a5fd
CUDA: add softmax broadcast (#14475)
|
6 months ago |
Johannes Gäßler
|
12a81af45f
CUDA: broadcasting for FlashAttention mask (#14500)
|
6 months ago |
Jeff Bolz
|
8875523eb3
vulkan: support softmax/FA batch and broadcast (#14449)
|
6 months ago |
Georgi Gerganov
|
ec68e84c32
ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (#14435)
|
6 months ago |
zhouwg
|
307e79d33d
opencl : fix possible buffer overflow in dump_tensor (#14490)
|
6 months ago |
Georgi Gerganov
|
d7f5f4e578
simple-chat : fix context-exceeded condition (#14494)
|
6 months ago |
Eric Zhang
|
c8a4e470f6
opencl : skip empty nodes on cgraph compute (#14491)
|
6 months ago |
lhez
|
603e43dc91
opencl : update upscale to support align corners (#14488)
|
6 months ago |
Sigbjørn Skjæret
|
611ba4b264
ci : add OpenCL to labeler workflow (#14496)
|
6 months ago |
Eric Zhang
|
85841e121d
github : add OpenCL backend to issue templates (#14492)
|
6 months ago |