Daniel Bevenius
|
cd3069dfcb
kv-cache : log (debug) all streams in find_slot (#15176)
|
5 months ago |
Sigbjørn Skjæret
|
50e81bdf5d
convert : fix merge conflicts (#15229)
|
5 months ago |
Daniel Bevenius
|
1ebbaddff2
perplexity : update comments/error msg to use decode [no ci] (#15227)
|
5 months ago |
Julien Denize
|
a3a7874272
convert : improve Mistral models integration (#14737)
|
5 months ago |
Charles Xu
|
002cb1bb33
kleidiai: fix unsigned overflow bug (#15150)
|
5 months ago |
David Zhao
|
79c1160b07
cuda: refactored ssm_scan and use CUB (#13291)
|
5 months ago |
Aman Gupta
|
34c9d765bf
CUDA: add attention sinks for tile and wmma (#15178)
|
5 months ago |
compilade
|
e54d41befc
gguf-py : add Numpy MXFP4 de/quantization support (#15111)
|
5 months ago |
Johannes Gäßler
|
4850b52aed
server-bench: external OAI servers, sqlite (#15179)
|
5 months ago |
AN Long
|
cd6983d56d
ggml : fix field name when new ggml_backend (#14944)
|
5 months ago |
Olivier Chafik
|
6c7e9a5440
vendor: sync minja (#15161)
|
5 months ago |
Johannes Gäßler
|
1425f587a8
CUDA: attention sinks for mma FlashAttention (#15157)
|
5 months ago |
lhez
|
aaa3d07ae7
opencl: support sink in `soft_max` (attn sinks) (#15152)
|
5 months ago |
Xuan-Son Nguyen
|
50aa938901
convert : support non-mxfp4 HF model (#15153)
|
5 months ago |
Jeff Bolz
|
c4f53563df
vulkan: support fattn sinks (#15126)
|
5 months ago |
Jeff Bolz
|
a0552c8bee
vulkan: Add env var to disable host visible vidmem (#15109)
|
5 months ago |
RunningLeon
|
99acbc9921
llama : Support intern-s1 (#14875)
|
5 months ago |
uvos
|
7ad67ba9fe
HIP: add cmake option to enable compiler output of kernel resource usage metrics (#15103)
|
5 months ago |
Christian Kastner
|
9a96389544
ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094)
|
5 months ago |
Johannes Gäßler
|
1d72c84188
CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (#15131)
|
5 months ago |
Johannes Gäßler
|
20638e4f16
scripts: fix crash when --tool is not set (#15133)
|
5 months ago |
Daniel Bevenius
|
36d3f00e14
requirements : fix PyTorch uint64 compatibility (#15134)
|
5 months ago |
Reese Levine
|
5fd160bbd9
ggml: Add basic SET_ROWS support in WebGPU (#15137)
|
5 months ago |
rmatif
|
756cfea826
fix profiling crash (#15072)
|
5 months ago |
lhez
|
e725a1a982
opencl: add `swiglu_oai` and `add_id` (#15121)
|
5 months ago |
Sachin Desai
|
3db4da56a5
chat : support Granite model reasoning and tool call (#14864)
|
5 months ago |
Juk Armstrong
|
476aa3fd57
Fixed name `-override-tensors` to `-override-tensor` (#15129)
|
5 months ago |
Diego Devesa
|
0d8831543c
ggml : fix fallback to CPU for ununsupported ops (#15118)
|
5 months ago |
Sigbjørn Skjæret
|
65c797c4fa
chat : fix yandex chat template (#15116)
|
5 months ago |
stevenkuang
|
25726898e8
chat : fix hunyuan auto-detection (#15114)
|
5 months ago |