Aman Gupta
|
bcb43163ae
ggml-cpu: Use tiled FA for prompt-processing (#19012)
|
3 days ago |
Georgi Gerganov
|
d9c6ce46f7
kv-cache : support V-less cache (#19067)
|
3 days ago |
Sigbjørn Skjæret
|
70d860824a
convert : fix Gemma3N, GraniteMoe and Ernie4.5Moe (#19084)
|
3 days ago |
Georgi Gerganov
|
080b161995
completion : fix prompt cache for recurrent models (#19045)
|
3 days ago |
Molly Sophia
|
1243f93a2d
readme: update RWKV7 model links (#19061)
|
3 days ago |
Jakkala Mahesh
|
24bc238303
llama: fix integer type consistency in split helpers (#18894)
|
3 days ago |
Daniel Bevenius
|
16639ba217
common : use two decimal places for float arg help messages (#19048)
|
3 days ago |
Bartowski
|
9981c30130
convert : fix conversion for inheriting models that were bypassing modify_tensors (#19064)
|
4 days ago |
Johannes Gäßler
|
e9fd8dcab4
llama-fit-params: keep explicit --ctx-size 0 (#19070)
|
4 days ago |
Johannes Gäßler
|
4e5b83b226
GGUF: check that tensor size is representable (#19072)
|
4 days ago |
Xuan-Son Nguyen
|
bb02f74c61
chat: fix language input for translategemma (#19052)
|
4 days ago |
Johannes Gäßler
|
8f91ca54ec
CUDA: re-use MLA K data for V in MMA FA (#19057)
|
4 days ago |
Aman Gupta
|
81ab64f3c8
ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934)
|
4 days ago |
nullname
|
8af1f5f430
ggml-hexagon: flash-attn opt (#19025)
|
4 days ago |
Georgi Gerganov
|
557515be1e
graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898)
|
5 days ago |
Neo Zhang
|
cb6caca191
[SYCL] use malloc to support both iGPU and dGPU in same time (#18992)
|
5 days ago |
Xuan-Son Nguyen
|
b5b8fa1c8b
chat : fix translategemma crash on common_chat_format_example (#19019)
|
5 days ago |
Daniel Bevenius
|
a14b960bc7
model-conversion : use BUILD_DIR variable in all scripts (#19015)
|
5 days ago |
Alberto Cabrera Pérez
|
091a46cb8d
ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (#18860)
|
5 days ago |
Aldehir Rojas
|
a3e812811d
cli : load parser definition (#19031)
|
6 days ago |
Xuan-Son Nguyen
|
51fa458a92
server : support preserving reasoning_content in assistant message (#18994)
|
6 days ago |
Georgi Gerganov
|
a5eaa1d6a3
mla : make the V tensor a view of K (#18986)
|
6 days ago |
Johannes Gäßler
|
e2baf02162
CUDA: fix alignment check for FA (#19023)
|
6 days ago |
Aman Gupta
|
e34d6d03b2
convert_hf_to_gguf.py: refactor modify_tensors to call super (#18866)
|
6 days ago |
lhez
|
9c96465f99
opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno (#18970)
|
6 days ago |
Xuan-Son Nguyen
|
4e595b250a
server: do not log certain endpoints (avoid log spam) (#19028)
|
6 days ago |
Georgi Gerganov
|
0e4ebeb057
quant : manual overrides of tensor types take precedence (#18952)
|
6 days ago |
Aaron Teo
|
8b30840703
release: update github api (#19022)
|
6 days ago |
Xuan-Son Nguyen
|
9eb5bfec1a
mtmd : update docs to use llama_model_n_embd_inp (#18999)
|
6 days ago |
손희준
|
c6926d1d95
server: Reorder methods in `server-task.cpp` (#19016)
|
6 days ago |