cturan/llama.cpp

Author	SHA1 Message	Date
Johannes Gäßler	e9fd8dcab4 llama-fit-params: keep explicit --ctx-size 0 (#19070)	5 days ago
Johannes Gäßler	4e5b83b226 GGUF: check that tensor size is representable (#19072)	5 days ago
Xuan-Son Nguyen	bb02f74c61 chat: fix language input for translategemma (#19052)	5 days ago
Johannes Gäßler	8f91ca54ec CUDA: re-use MLA K data for V in MMA FA (#19057)	5 days ago
Aman Gupta	81ab64f3c8 ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934)	6 days ago
nullname	8af1f5f430 ggml-hexagon: flash-attn opt (#19025)	6 days ago
Georgi Gerganov	557515be1e graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898)	6 days ago
Neo Zhang	cb6caca191 [SYCL] use malloc to support both iGPU and dGPU in same time (#18992)	6 days ago
Xuan-Son Nguyen	b5b8fa1c8b chat : fix translategemma crash on common_chat_format_example (#19019)	6 days ago
Daniel Bevenius	a14b960bc7 model-conversion : use BUILD_DIR variable in all scripts (#19015)	6 days ago
Alberto Cabrera Pérez	091a46cb8d ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (#18860)	6 days ago
Aldehir Rojas	a3e812811d cli : load parser definition (#19031)	1 week ago
Xuan-Son Nguyen	51fa458a92 server : support preserving reasoning_content in assistant message (#18994)	1 week ago
Georgi Gerganov	a5eaa1d6a3 mla : make the V tensor a view of K (#18986)	1 week ago
Johannes Gäßler	e2baf02162 CUDA: fix alignment check for FA (#19023)	1 week ago
Aman Gupta	e34d6d03b2 convert_hf_to_gguf.py: refactor modify_tensors to call super (#18866)	1 week ago
lhez	9c96465f99 opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno (#18970)	1 week ago
Xuan-Son Nguyen	4e595b250a server: do not log certain endpoints (avoid log spam) (#19028)	1 week ago
Georgi Gerganov	0e4ebeb057 quant : manual overrides of tensor types take precedence (#18952)	1 week ago
Aaron Teo	8b30840703 release: update github api (#19022)	1 week ago
Xuan-Son Nguyen	9eb5bfec1a mtmd : update docs to use llama_model_n_embd_inp (#18999)	1 week ago
손희준	c6926d1d95 server: Reorder methods in `server-task.cpp` (#19016)	1 week ago
Aman Gupta	b70d251076 CUDA: add gqa_ratio 4 for GLM 4.7 flash (#18953)	1 week ago
shaofeiqi	5516b9c16a opencl: add TRI op support (#18979)	1 week ago
Aleksei Nikiforov	94242a62c0 ggml-zdnn : mark zDNN buffers as non-host (#18967)	1 week ago
Pádraic Slattery	6b99a223e3 ci : update GitHub Actions versions [no ci] (#18935)	1 week ago
Mariusz Woloszyn	77078e80e5 convert : add Devstral-2 (Ministral3ForCausalLM) arch (#18972)	1 week ago
Piotr Wilkin (ilintar)	c301172f66 jinja: support none\|string (#18995)	1 week ago
Hendrik Erz	3802d3c78f fix: Use `tabular-nums` for chat message statistics (#18915)	1 week ago
Daniel Bevenius	9da3dcd753 llama : clarify nemotron-h.cpp comment about RoPE [no ci] (#18997)	1 week ago

Newer Older

Commit History Find

Commit History