Sigbjørn Skjæret
|
bc5182272c
ci : add copilot-setup-steps.yml (#15214)
|
5 meses atrás |
Tak-RS
|
e71d48e326
ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) (#15188)
|
5 meses atrás |
uvos
|
b0493156fa
HIP: disable sync warp shuffel operators from clr amd_warp_sync_functions.h (#15273)
|
5 meses atrás |
Romain Biessy
|
f4586ee598
sycl: Fix and disable more configurations of mul_mat (#15151)
|
5 meses atrás |
rmatif
|
60a7658810
opencl: allow mixed f16/f32 `add` (#15140)
|
5 meses atrás |
Aman Gupta
|
efe3a90996
CUDA cmake: add `-lineinfo` for easier debug (#15260)
|
5 meses atrás |
Chenguang Li
|
bbd57b7eaf
CANN: GGML_OP_CPY optimization (#15070)
|
5 meses atrás |
R0CKSTAR
|
25ff6f7659
musa: fix failures in test-backend-ops for mul_mat_id op (#15236)
|
5 meses atrás |
hipudding
|
be48528b06
CANN: Add broadcast for softmax and FA (#15208)
|
5 meses atrás |
rainred
|
cf9e5648a7
mtmd : Fix MinicpmV model converter and clip to avoid using hardcode. (#14750)
|
5 meses atrás |
Xuan-Son Nguyen
|
fba5c0d680
chat : hotfix gpt-oss jinja raising an exception (#15243)
|
5 meses atrás |
Xuan-Son Nguyen
|
53d0a12658
server : allow specifying reasoning_format in HTTP request (#15238)
|
5 meses atrás |
Zagaj
|
27093afe78
readme : update infra list (#15234)
|
5 meses atrás |
Georgi Gerganov
|
228f724d9c
kv-cache : fix seq_rm with seq_id == -1 (#15226)
|
5 meses atrás |
Daniel Bevenius
|
cd3069dfcb
kv-cache : log (debug) all streams in find_slot (#15176)
|
5 meses atrás |
Sigbjørn Skjæret
|
50e81bdf5d
convert : fix merge conflicts (#15229)
|
5 meses atrás |
Daniel Bevenius
|
1ebbaddff2
perplexity : update comments/error msg to use decode [no ci] (#15227)
|
5 meses atrás |
Julien Denize
|
a3a7874272
convert : improve Mistral models integration (#14737)
|
5 meses atrás |
Charles Xu
|
002cb1bb33
kleidiai: fix unsigned overflow bug (#15150)
|
5 meses atrás |
David Zhao
|
79c1160b07
cuda: refactored ssm_scan and use CUB (#13291)
|
5 meses atrás |
Aman Gupta
|
34c9d765bf
CUDA: add attention sinks for tile and wmma (#15178)
|
5 meses atrás |
compilade
|
e54d41befc
gguf-py : add Numpy MXFP4 de/quantization support (#15111)
|
5 meses atrás |
Johannes Gäßler
|
4850b52aed
server-bench: external OAI servers, sqlite (#15179)
|
5 meses atrás |
AN Long
|
cd6983d56d
ggml : fix field name when new ggml_backend (#14944)
|
5 meses atrás |
Olivier Chafik
|
6c7e9a5440
vendor: sync minja (#15161)
|
5 meses atrás |
Johannes Gäßler
|
1425f587a8
CUDA: attention sinks for mma FlashAttention (#15157)
|
5 meses atrás |
lhez
|
aaa3d07ae7
opencl: support sink in `soft_max` (attn sinks) (#15152)
|
5 meses atrás |
Xuan-Son Nguyen
|
50aa938901
convert : support non-mxfp4 HF model (#15153)
|
5 meses atrás |
Jeff Bolz
|
c4f53563df
vulkan: support fattn sinks (#15126)
|
5 meses atrás |
Jeff Bolz
|
a0552c8bee
vulkan: Add env var to disable host visible vidmem (#15109)
|
5 meses atrás |