Commit History

Author SHA1 Message Date
  Reese Levine 15bff84bf5 ggml webgpu: initial flashattention implementation (#18610) 3 weeks ago
  Jeff Bolz 2524c26164 vulkan: fix push constant size for quantize_q8_1 (#18687) 3 weeks ago
  Jeff Bolz cb14b06995 vulkan: optimize ssm_scan (#18630) 3 weeks ago
  Adrien Gallouët 55abc39355 vendor : update cpp-httplib to 0.30.0 (#18660) 3 weeks ago
  Georgi Gerganov f2f6c88067 scripts : support chaining commands in pr2wt.sh (#18671) 3 weeks ago
  도로로도로또 945bf10627 metal : add MoE kernel specialization for ne20=5 (#18667) 3 weeks ago
  Johannes Gäßler 64848deb18 llama-fit-params: free memory target per device (#18679) 3 weeks ago
  Doctor Shotgun 9a5724dee2 ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535) 3 weeks ago
  Daniel Bevenius 9c142e3a2a model-conversion : add warn about transformers mismatch (#18691) 3 weeks ago
  Daniel Bevenius df7fb92170 model-conversion : remove -st targets for converted model (#18689) 3 weeks ago
  Julius Tischbein 2038101bd9 llama : add `use_direct_io` flag for model loading (#18166) 3 weeks ago
  shaofeiqi 568371a726 opencl: add FILL op support (#18682) 3 weeks ago
  Sigbjørn Skjæret 5b8844ae53 scripts : fix repos cloned with .git extension (#18669) 3 weeks ago
  Sigbjørn Skjæret 7e16fef085 convert : more variants of rope_theta config entries (#18668) 3 weeks ago
  Oliver Walsh f5245b5e4e cuda : fix build on cuda 12.8 (#18672) 3 weeks ago
  R ae9f8df778 fix(docker): add missing libglvnd libraries to Vulkan image (#18664) 3 weeks ago
  Adrien Gallouët 56d2fed2b3 tools : remove llama-run (#18661) 3 weeks ago
  Georgi Gerganov 56426673cb scripts : add pr2wt.sh (#18644) 3 weeks ago
  Daniel Bevenius bb77764c2d convert : clarify sentence-transformers-dense-modules help [no ci] (#18662) 3 weeks ago
  Sigbjørn Skjæret 9dfa8ee950 ci : run cann build unconditionally [no ci] (#18659) 3 weeks ago
  Jeff Bolz ca4a8370bc vulkan: reject ops when a tensor is too large to allocate (#18646) 3 weeks ago
  virajwad 03023296cf vulkan: Warptile tuning for Intel Xe2/Xe3 (#18178) 3 weeks ago
  Eve 8c77a04cc7 vulkan: more mul mat optimizations (#18533) 3 weeks ago
  Daniel Bevenius ffba4f29e6 examples : add debug utility/example (#18464) 3 weeks ago
  hipudding 3333951d86 CANN: Fix rename for get_env (#18652) 3 weeks ago
  Raul Torres 193ee38a1b CANN: Rename `get_env` to `get_env_as_lowercase` (#18624) 3 weeks ago
  Max Krasnyansky 95ea9e0861 Hexagon add support for f16/f32 flash attention, scale, set-rows and improve f16/32 matmul (#18611) 3 weeks ago
  Tarek Dakhran ccbc84a537 mtmd: mtmd_audio_streaming_istft (#18645) 3 weeks ago
  Johannes Gäßler 68b4d516c3 llama-params-fit: fix last devices with low VRAM (#18494) 3 weeks ago
  Aadeshveer Singh 24af22fc36 ggml : optimize cuda ssm_scan using warp-level reduction (#18505) 3 weeks ago