Commit History

Author SHA1 Message Date
  Johannes Gäßler e9fd8dcab4 llama-fit-params: keep explicit --ctx-size 0 (#19070) 5 days ago
  Johannes Gäßler 4e5b83b226 GGUF: check that tensor size is representable (#19072) 5 days ago
  Xuan-Son Nguyen bb02f74c61 chat: fix language input for translategemma (#19052) 5 days ago
  Johannes Gäßler 8f91ca54ec CUDA: re-use MLA K data for V in MMA FA (#19057) 5 days ago
  Aman Gupta 81ab64f3c8 ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934) 6 days ago
  nullname 8af1f5f430 ggml-hexagon: flash-attn opt (#19025) 6 days ago
  Georgi Gerganov 557515be1e graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898) 6 days ago
  Neo Zhang cb6caca191 [SYCL] use malloc to support both iGPU and dGPU in same time (#18992) 6 days ago
  Xuan-Son Nguyen b5b8fa1c8b chat : fix translategemma crash on common_chat_format_example (#19019) 6 days ago
  Daniel Bevenius a14b960bc7 model-conversion : use BUILD_DIR variable in all scripts (#19015) 6 days ago
  Alberto Cabrera Pérez 091a46cb8d ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (#18860) 6 days ago
  Aldehir Rojas a3e812811d cli : load parser definition (#19031) 1 week ago
  Xuan-Son Nguyen 51fa458a92 server : support preserving reasoning_content in assistant message (#18994) 1 week ago
  Georgi Gerganov a5eaa1d6a3 mla : make the V tensor a view of K (#18986) 1 week ago
  Johannes Gäßler e2baf02162 CUDA: fix alignment check for FA (#19023) 1 week ago
  Aman Gupta e34d6d03b2 convert_hf_to_gguf.py: refactor modify_tensors to call super (#18866) 1 week ago
  lhez 9c96465f99 opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno (#18970) 1 week ago
  Xuan-Son Nguyen 4e595b250a server: do not log certain endpoints (avoid log spam) (#19028) 1 week ago
  Georgi Gerganov 0e4ebeb057 quant : manual overrides of tensor types take precedence (#18952) 1 week ago
  Aaron Teo 8b30840703 release: update github api (#19022) 1 week ago
  Xuan-Son Nguyen 9eb5bfec1a mtmd : update docs to use llama_model_n_embd_inp (#18999) 1 week ago
  손희준 c6926d1d95 server: Reorder methods in `server-task.cpp` (#19016) 1 week ago
  Aman Gupta b70d251076 CUDA: add gqa_ratio 4 for GLM 4.7 flash (#18953) 1 week ago
  shaofeiqi 5516b9c16a opencl: add TRI op support (#18979) 1 week ago
  Aleksei Nikiforov 94242a62c0 ggml-zdnn : mark zDNN buffers as non-host (#18967) 1 week ago
  Pádraic Slattery 6b99a223e3 ci : update GitHub Actions versions [no ci] (#18935) 1 week ago
  Mariusz Woloszyn 77078e80e5 convert : add Devstral-2 (Ministral3ForCausalLM) arch (#18972) 1 week ago
  Piotr Wilkin (ilintar) c301172f66 jinja: support none|string (#18995) 1 week ago
  Hendrik Erz 3802d3c78f fix: Use `tabular-nums` for chat message statistics (#18915) 1 week ago
  Daniel Bevenius 9da3dcd753 llama : clarify nemotron-h.cpp comment about RoPE [no ci] (#18997) 1 week ago