cturan/llama.cpp

Autor	SHA1 Mensaxe	Data
ccbinn	0440bfd160 metal : fix recommendedMaxWorkingSetSize availability on legacy iOS/macOS (#19088)	hai 1 semana
Sigbjørn Skjæret	0bf5636938 convert : yield Gemma3N custom_map tensors directly (#19091)	hai 1 semana
Aman Gupta	bcb43163ae ggml-cpu: Use tiled FA for prompt-processing (#19012)	hai 1 semana
Georgi Gerganov	d9c6ce46f7 kv-cache : support V-less cache (#19067)	hai 1 semana
Sigbjørn Skjæret	70d860824a convert : fix Gemma3N, GraniteMoe and Ernie4.5Moe (#19084)	hai 1 semana
Georgi Gerganov	080b161995 completion : fix prompt cache for recurrent models (#19045)	hai 1 semana
Molly Sophia	1243f93a2d readme: update RWKV7 model links (#19061)	hai 1 semana
Jakkala Mahesh	24bc238303 llama: fix integer type consistency in split helpers (#18894)	hai 1 semana
Daniel Bevenius	16639ba217 common : use two decimal places for float arg help messages (#19048)	hai 1 semana
Bartowski	9981c30130 convert : fix conversion for inheriting models that were bypassing modify_tensors (#19064)	hai 1 semana
Johannes Gäßler	e9fd8dcab4 llama-fit-params: keep explicit --ctx-size 0 (#19070)	hai 1 semana
Johannes Gäßler	4e5b83b226 GGUF: check that tensor size is representable (#19072)	hai 1 semana
Xuan-Son Nguyen	bb02f74c61 chat: fix language input for translategemma (#19052)	hai 1 semana
Johannes Gäßler	8f91ca54ec CUDA: re-use MLA K data for V in MMA FA (#19057)	hai 1 semana
Aman Gupta	81ab64f3c8 ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934)	hai 1 semana
nullname	8af1f5f430 ggml-hexagon: flash-attn opt (#19025)	hai 1 semana
Georgi Gerganov	557515be1e graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898)	hai 1 semana
Neo Zhang	cb6caca191 [SYCL] use malloc to support both iGPU and dGPU in same time (#18992)	hai 1 semana
Xuan-Son Nguyen	b5b8fa1c8b chat : fix translategemma crash on common_chat_format_example (#19019)	hai 1 semana
Daniel Bevenius	a14b960bc7 model-conversion : use BUILD_DIR variable in all scripts (#19015)	hai 1 semana
Alberto Cabrera Pérez	091a46cb8d ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (#18860)	hai 1 semana
Aldehir Rojas	a3e812811d cli : load parser definition (#19031)	hai 1 semana
Xuan-Son Nguyen	51fa458a92 server : support preserving reasoning_content in assistant message (#18994)	hai 1 semana
Georgi Gerganov	a5eaa1d6a3 mla : make the V tensor a view of K (#18986)	hai 1 semana
Johannes Gäßler	e2baf02162 CUDA: fix alignment check for FA (#19023)	hai 1 semana
Aman Gupta	e34d6d03b2 convert_hf_to_gguf.py: refactor modify_tensors to call super (#18866)	hai 1 semana
lhez	9c96465f99 opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno (#18970)	hai 1 semana
Xuan-Son Nguyen	4e595b250a server: do not log certain endpoints (avoid log spam) (#19028)	hai 1 semana
Georgi Gerganov	0e4ebeb057 quant : manual overrides of tensor types take precedence (#18952)	hai 1 semana
Aaron Teo	8b30840703 release: update github api (#19022)	hai 1 semana

Posterior Anterior

Commit History Buscar

Commit History