Xuan-Son Nguyen
|
9e79b0116e
convert: allow using quantized Mistral weight (#17889)
|
hai 1 mes |
Neo Zhang Jianyu
|
2e9eab80c2
fix softmax for iGPU (#17838)
|
hai 1 mes |
Aldehir Rojas
|
2fbe3b7bb7
common : add parser for ministral/mistral large 3/devstral 2 (#17713)
|
hai 1 mes |
Sigbjørn Skjæret
|
63391852b0
docs : update cpu and cuda ops (#17890)
|
hai 1 mes |
Gabe Goodhart
|
086a63e3a5
metal: SSM kernel improvements (#17876)
|
hai 1 mes |
Piotr Wilkin (ilintar)
|
b63509262a
Add DIAG for CUDA (#17873)
|
hai 1 mes |
Johannes Gäßler
|
48f47565a7
docs: clarify that CPU support should be first (#17886)
|
hai 1 mes |
Gabe Goodhart
|
02e409a5be
ggml : Provide macos-specific backtrace printing to avoid terminal death (#17869)
|
hai 1 mes |
Georgi Gerganov
|
6b82eb7883
metal : print node names for debugging (#17882)
|
hai 1 mes |
Sigbjørn Skjæret
|
86a3f0fad8
ggml : allow fill node alloc inplace (#17870)
|
hai 1 mes |
Rhys-T
|
63908b631a
cmake: fix Mach-O current version number (#17877)
|
hai 1 mes |
Sigbjørn Skjæret
|
42b12b5608
model : nit, DeepSeek V1 MoE is 16B and GigaChat is 20B (#12652)
|
hai 1 mes |
Xuan-Son Nguyen
|
4e842d5120
console: allow using arrow left/right, home/end keys and history mode (#17836)
|
hai 1 mes |
Chenguang Li
|
ca709e427b
CANN: add support for partial RoPE and Vision mode (#17543)
|
hai 1 mes |
Johannes Gäßler
|
0cdce38a97
CUDA: fix FP16 overflow in tile FA kernel (#17875)
|
hai 1 mes |
Aldehir Rojas
|
e39502e74b
llama : add token matching support to llama-grammar (#17816)
|
hai 1 mes |
philip-essential
|
1d2a1ab73d
model : support Rnj-1 (#17811)
|
hai 1 mes |
Sigbjørn Skjæret
|
c8554b66e0
graph : use fill instead of scale_bias in grouped expert selection (#17867)
|
hai 1 mes |
Daniel Bevenius
|
2fa51c19b0
model-conversion : add token ids to prompt token output [no ci] (#17863)
|
hai 1 mes |
Xuan-Son Nguyen
|
951520ddb0
server: delegate result_state creation to server_task (#17835)
|
hai 1 mes |
Neo Zhang
|
68522c678d
ci : support bfloat16 SYCL release package (#17855)
|
hai 1 mes |
Xuan-Son Nguyen
|
f896d2c34f
server: improve speed of speculative decoding (#17808)
|
hai 1 mes |
Piotr Wilkin (ilintar)
|
e4e9c4329c
Make graph_max_nodes vary by ubatch size (#17794)
|
hai 1 mes |
hksdpc255
|
636fc17a37
Fix Kimi-K2 tool-call parsing issues (#17376)
|
hai 1 mes |
Jay Zenith
|
51e0c2d917
cuda : add FILL op support (#17851)
|
hai 1 mes |
Xuan-Son Nguyen
|
37a4f63244
server : add development documentation (#17760)
|
hai 1 mes |
Georgi Gerganov
|
2bc96931d2
server : make cache_reuse configurable per request (#17858)
|
hai 1 mes |
wsbagnsv1
|
5814b4dce1
cuda: optimize SOLVE_TRI using registers and FMAF (#17703)
|
hai 1 mes |
ixgbe
|
79d61896d3
ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (#17784)
|
hai 1 mes |
Xuan-Son Nguyen
|
4d3726278b
model: add llama 4 scaling for mistral-large (deepseek arch) (#17744)
|
hai 1 mes |