Commit History

Author SHA1 Message Date
  Aman Gupta bcb43163ae ggml-cpu: Use tiled FA for prompt-processing (#19012) 3 days ago
  Georgi Gerganov d9c6ce46f7 kv-cache : support V-less cache (#19067) 3 days ago
  Sigbjørn Skjæret 70d860824a convert : fix Gemma3N, GraniteMoe and Ernie4.5Moe (#19084) 3 days ago
  Georgi Gerganov 080b161995 completion : fix prompt cache for recurrent models (#19045) 3 days ago
  Molly Sophia 1243f93a2d readme: update RWKV7 model links (#19061) 3 days ago
  Jakkala Mahesh 24bc238303 llama: fix integer type consistency in split helpers (#18894) 3 days ago
  Daniel Bevenius 16639ba217 common : use two decimal places for float arg help messages (#19048) 3 days ago
  Bartowski 9981c30130 convert : fix conversion for inheriting models that were bypassing modify_tensors (#19064) 4 days ago
  Johannes Gäßler e9fd8dcab4 llama-fit-params: keep explicit --ctx-size 0 (#19070) 4 days ago
  Johannes Gäßler 4e5b83b226 GGUF: check that tensor size is representable (#19072) 4 days ago
  Xuan-Son Nguyen bb02f74c61 chat: fix language input for translategemma (#19052) 4 days ago
  Johannes Gäßler 8f91ca54ec CUDA: re-use MLA K data for V in MMA FA (#19057) 4 days ago
  Aman Gupta 81ab64f3c8 ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934) 4 days ago
  nullname 8af1f5f430 ggml-hexagon: flash-attn opt (#19025) 4 days ago
  Georgi Gerganov 557515be1e graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898) 5 days ago
  Neo Zhang cb6caca191 [SYCL] use malloc to support both iGPU and dGPU in same time (#18992) 5 days ago
  Xuan-Son Nguyen b5b8fa1c8b chat : fix translategemma crash on common_chat_format_example (#19019) 5 days ago
  Daniel Bevenius a14b960bc7 model-conversion : use BUILD_DIR variable in all scripts (#19015) 5 days ago
  Alberto Cabrera Pérez 091a46cb8d ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (#18860) 5 days ago
  Aldehir Rojas a3e812811d cli : load parser definition (#19031) 6 days ago
  Xuan-Son Nguyen 51fa458a92 server : support preserving reasoning_content in assistant message (#18994) 6 days ago
  Georgi Gerganov a5eaa1d6a3 mla : make the V tensor a view of K (#18986) 6 days ago
  Johannes Gäßler e2baf02162 CUDA: fix alignment check for FA (#19023) 6 days ago
  Aman Gupta e34d6d03b2 convert_hf_to_gguf.py: refactor modify_tensors to call super (#18866) 6 days ago
  lhez 9c96465f99 opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno (#18970) 6 days ago
  Xuan-Son Nguyen 4e595b250a server: do not log certain endpoints (avoid log spam) (#19028) 6 days ago
  Georgi Gerganov 0e4ebeb057 quant : manual overrides of tensor types take precedence (#18952) 6 days ago
  Aaron Teo 8b30840703 release: update github api (#19022) 6 days ago
  Xuan-Son Nguyen 9eb5bfec1a mtmd : update docs to use llama_model_n_embd_inp (#18999) 6 days ago
  손희준 c6926d1d95 server: Reorder methods in `server-task.cpp` (#19016) 6 days ago