cturan/llama.cpp

Author	SHA1 Message	Date
Aman Gupta	bcb43163ae ggml-cpu: Use tiled FA for prompt-processing (#19012)	3 days ago
Georgi Gerganov	d9c6ce46f7 kv-cache : support V-less cache (#19067)	3 days ago
Sigbjørn Skjæret	70d860824a convert : fix Gemma3N, GraniteMoe and Ernie4.5Moe (#19084)	3 days ago
Georgi Gerganov	080b161995 completion : fix prompt cache for recurrent models (#19045)	3 days ago
Molly Sophia	1243f93a2d readme: update RWKV7 model links (#19061)	3 days ago
Jakkala Mahesh	24bc238303 llama: fix integer type consistency in split helpers (#18894)	3 days ago
Daniel Bevenius	16639ba217 common : use two decimal places for float arg help messages (#19048)	3 days ago
Bartowski	9981c30130 convert : fix conversion for inheriting models that were bypassing modify_tensors (#19064)	4 days ago
Johannes Gäßler	e9fd8dcab4 llama-fit-params: keep explicit --ctx-size 0 (#19070)	4 days ago
Johannes Gäßler	4e5b83b226 GGUF: check that tensor size is representable (#19072)	4 days ago
Xuan-Son Nguyen	bb02f74c61 chat: fix language input for translategemma (#19052)	4 days ago
Johannes Gäßler	8f91ca54ec CUDA: re-use MLA K data for V in MMA FA (#19057)	4 days ago
Aman Gupta	81ab64f3c8 ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934)	4 days ago
nullname	8af1f5f430 ggml-hexagon: flash-attn opt (#19025)	4 days ago
Georgi Gerganov	557515be1e graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898)	5 days ago
Neo Zhang	cb6caca191 [SYCL] use malloc to support both iGPU and dGPU in same time (#18992)	5 days ago
Xuan-Son Nguyen	b5b8fa1c8b chat : fix translategemma crash on common_chat_format_example (#19019)	5 days ago
Daniel Bevenius	a14b960bc7 model-conversion : use BUILD_DIR variable in all scripts (#19015)	5 days ago
Alberto Cabrera Pérez	091a46cb8d ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (#18860)	5 days ago
Aldehir Rojas	a3e812811d cli : load parser definition (#19031)	6 days ago
Xuan-Son Nguyen	51fa458a92 server : support preserving reasoning_content in assistant message (#18994)	6 days ago
Georgi Gerganov	a5eaa1d6a3 mla : make the V tensor a view of K (#18986)	6 days ago
Johannes Gäßler	e2baf02162 CUDA: fix alignment check for FA (#19023)	6 days ago
Aman Gupta	e34d6d03b2 convert_hf_to_gguf.py: refactor modify_tensors to call super (#18866)	6 days ago
lhez	9c96465f99 opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno (#18970)	6 days ago
Xuan-Son Nguyen	4e595b250a server: do not log certain endpoints (avoid log spam) (#19028)	6 days ago
Georgi Gerganov	0e4ebeb057 quant : manual overrides of tensor types take precedence (#18952)	6 days ago
Aaron Teo	8b30840703 release: update github api (#19022)	6 days ago
Xuan-Son Nguyen	9eb5bfec1a mtmd : update docs to use llama_model_n_embd_inp (#18999)	6 days ago
손희준	c6926d1d95 server: Reorder methods in `server-task.cpp` (#19016)	6 days ago

Newer Older

Commit History Find

Commit History