Commit History

Author SHA1 Message Date
  Georgi Gerganov b8595b16e6 mtmd : fix embedding size for image input (#17123) 2 months ago
  Ruben Ortlam 392e09a608 vulkan: fix memory allocations (#17122) 2 months ago
  compilade 802cef44bf convert : parse safetensors directly (#15667) 2 months ago
  compilade 1c07c0c68c convert : handle compressed-tensors quant method (#17069) 2 months ago
  Georgi Gerganov cb1adf8851 server : handle failures to restore host cache (#17078) 2 months ago
  Georgi Gerganov ef1d826997 benches : add folder with benchmarks (#16931) 2 months ago
  Eric Curtin 86fde91e62 Switch to using Ubuntu 25.10 vulkan/mesa (#16497) 2 months ago
  Ruben Ortlam 7f3e9d339c vulkan: iGPU memory reporting fix (#17110) 2 months ago
  Ruben Ortlam 8a3519b708 vulkan: fix mmq out of bounds reads (#17108) 2 months ago
  Jeff Bolz 80a6cf6347 vulkan: fuse mul_mat_id + mul (#17095) 2 months ago
  Georgi Gerganov 0750a59903 metal : retain src and dst buffers during async ops (#17101) 2 months ago
  Xuan-Son Nguyen aa3b7a90b4 arg: add --cache-list argument to list cached models (#17073) 2 months ago
  chansikpark 333f2595a3 webui: fix keyboard shortcuts for new chat & edit chat title (#17007) 2 months ago
  Jeff Bolz 53d7d21e61 vulkan: Use spec constants for conv2d s/d/p and kernel W/H (#16978) 2 months ago
  Aidan eeee367de5 server: fix correct time_ms calculation in prompt_progress (#17093) 2 months ago
  Aman Gupta 64fe17fbb8 Revert "CUDA: add expert reduce kernel (#16857)" (#17100) 2 months ago
  Aman Gupta c1b187688d CUDA: skip fusion for repeating adds in bias (#17080) 2 months ago
  SavicStefan b8a5cfd11a vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp (#16636) 2 months ago
  Aleksei Nikiforov 08416ebe7f ggml: disable vxe for cross-compilation by default (#16966) 2 months ago
  Jeff Bolz b4e335d8dc vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (#16977) 2 months ago
  Jeff Bolz d6fe40fa00 vulkan: Fix test-thread-safety crashes (#17024) 2 months ago
  Johannes Gäßler e14e842e87 CUDA: fix MMQ stream-k fixup ne1 indices (#17089) 2 months ago
  Reese Levine 647b960bd8 ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031) 2 months ago
  bssrdf 299f5d782c CUDA: properly handle nb00=nb02 case for cpy (#17081) 2 months ago
  Acly ac76d36201 vulkan : refactor buffer handling in vk_op_f32 (#16840) 2 months ago
  Johannes Gäßler 6515610506 CUDA: fix should_use_mmvf for ne11 == 1 (#17085) 2 months ago
  Georgi Gerganov 7956bb4d7f bench : cache the llama_context state at computed depth (#16944) 2 months ago
  Sigbjørn Skjæret 9008027aa3 hparams : add n_embd_inp() to support extended embed (#16928) 2 months ago
  Georgi Gerganov 16bcc1259d kv-cache : pad the cache size to 256 for performance (#17046) 2 months ago
  Adrien Gallouët 9eb9a1331d Revert "ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239)" (#17084) 2 months ago