cturan/llama.cpp

Autor	SHA1 Mensaxe	Data
Xuan-Son Nguyen	9e79b0116e convert: allow using quantized Mistral weight (#17889)	hai 1 mes
Neo Zhang Jianyu	2e9eab80c2 fix softmax for iGPU (#17838)	hai 1 mes
Aldehir Rojas	2fbe3b7bb7 common : add parser for ministral/mistral large 3/devstral 2 (#17713)	hai 1 mes
Sigbjørn Skjæret	63391852b0 docs : update cpu and cuda ops (#17890)	hai 1 mes
Gabe Goodhart	086a63e3a5 metal: SSM kernel improvements (#17876)	hai 1 mes
Piotr Wilkin (ilintar)	b63509262a Add DIAG for CUDA (#17873)	hai 1 mes
Johannes Gäßler	48f47565a7 docs: clarify that CPU support should be first (#17886)	hai 1 mes
Gabe Goodhart	02e409a5be ggml : Provide macos-specific backtrace printing to avoid terminal death (#17869)	hai 1 mes
Georgi Gerganov	6b82eb7883 metal : print node names for debugging (#17882)	hai 1 mes
Sigbjørn Skjæret	86a3f0fad8 ggml : allow fill node alloc inplace (#17870)	hai 1 mes
Rhys-T	63908b631a cmake: fix Mach-O current version number (#17877)	hai 1 mes
Sigbjørn Skjæret	42b12b5608 model : nit, DeepSeek V1 MoE is 16B and GigaChat is 20B (#12652)	hai 1 mes
Xuan-Son Nguyen	4e842d5120 console: allow using arrow left/right, home/end keys and history mode (#17836)	hai 1 mes
Chenguang Li	ca709e427b CANN: add support for partial RoPE and Vision mode (#17543)	hai 1 mes
Johannes Gäßler	0cdce38a97 CUDA: fix FP16 overflow in tile FA kernel (#17875)	hai 1 mes
Aldehir Rojas	e39502e74b llama : add token matching support to llama-grammar (#17816)	hai 1 mes
philip-essential	1d2a1ab73d model : support Rnj-1 (#17811)	hai 1 mes
Sigbjørn Skjæret	c8554b66e0 graph : use fill instead of scale_bias in grouped expert selection (#17867)	hai 1 mes
Daniel Bevenius	2fa51c19b0 model-conversion : add token ids to prompt token output [no ci] (#17863)	hai 1 mes
Xuan-Son Nguyen	951520ddb0 server: delegate result_state creation to server_task (#17835)	hai 1 mes
Neo Zhang	68522c678d ci : support bfloat16 SYCL release package (#17855)	hai 1 mes
Xuan-Son Nguyen	f896d2c34f server: improve speed of speculative decoding (#17808)	hai 1 mes
Piotr Wilkin (ilintar)	e4e9c4329c Make graph_max_nodes vary by ubatch size (#17794)	hai 1 mes
hksdpc255	636fc17a37 Fix Kimi-K2 tool-call parsing issues (#17376)	hai 1 mes
Jay Zenith	51e0c2d917 cuda : add FILL op support (#17851)	hai 1 mes
Xuan-Son Nguyen	37a4f63244 server : add development documentation (#17760)	hai 1 mes
Georgi Gerganov	2bc96931d2 server : make cache_reuse configurable per request (#17858)	hai 1 mes
wsbagnsv1	5814b4dce1 cuda: optimize SOLVE_TRI using registers and FMAF (#17703)	hai 1 mes
ixgbe	79d61896d3 ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (#17784)	hai 1 mes
Xuan-Son Nguyen	4d3726278b model: add llama 4 scaling for mistral-large (deepseek arch) (#17744)	hai 1 mes

Posterior Anterior

Commit History Buscar

Commit History