コミット履歴

作者 SHA1 メッセージ 日付
  Oliver Simons 00681dfc16 CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#15872) 4 ヶ月 前
  Daniel Bevenius e7b6d83b52 tests : filter out no-ops from coverage report (#15900) 4 ヶ月 前
  Jeff Bolz 4f63cd705c vulkan: Fix OOB accesses in soft_max_back (#15861) 4 ヶ月 前
  Aman Gupta a972faebed CUDA: Add mul_mat_id support for the mmf kernel (#15767) 4 ヶ月 前
  Georgi Gerganov f28d4f4ac9 metal : refactor + optimize (#15857) 4 ヶ月 前
  Xuan-Son Nguyen 9fcb29f22f ggml: allow casting between f32 and i32 (#15783) 4 ヶ月 前
  Jeff Bolz d413dca003 tests: large sizes for get_rows (#15687) 4 ヶ月 前
  Jeff Bolz 3976dfbe00 vulkan: support im2col_3d (#15795) 4 ヶ月 前
  Jeff Bolz c97b5e5854 vulkan: Support pad_ext (#15794) 4 ヶ月 前
  Daniel Bevenius 3a550b5ca4 tests : add --list-ops and --show-coverage options (#15745) 4 ヶ月 前
  leejet 0a1b3982cd ggml: add ops for WAN video model (cuda && cpu) (#15669) 4 ヶ月 前
  rmatif 86076f92de OpenCL: add fused group_norm/norm, mul, add (#15314) 4 ヶ月 前
  Eve 44b1efa41a tests: add performance test for mul mat id (#15543) 4 ヶ月 前
  Georgi Gerganov 1d8d83deaa metal : improve `MUL_MAT_ID` (#15541) 4 ヶ月 前
  Jeff Bolz 34bdbbd7c2 vulkan: Remove splitting for mul_mat_id (#15568) 4 ヶ月 前
  Jeff Bolz 886b97a5d6 tests: Generate unique input values for count_equal (#15487) 4 ヶ月 前
  Jeff Bolz c9a24fb932 vulkan: Support FA with any multiple of 8 head sizes (#15537) 4 ヶ月 前
  Jeff Bolz 611f419cff vulkan: optimize rms_norm, and allow the work to spread across multiple SMs (#15281) 4 ヶ月 前
  Acly 0a9b43e507 vulkan : support ggml_mean (#15393) 4 ヶ月 前
  rmatif 92f7f0a53c ggml: add `conv3d` op (#15182) 4 ヶ月 前
  Jeff Bolz 96452a3fa4 vulkan: Reuse conversion results in prealloc_y (#15410) 4 ヶ月 前
  Jeff Bolz de5627910d vulkan: Optimize argsort (#15354) 5 ヶ月 前
  Jeff Bolz 1fe00296f5 vulkan: fuse adds (#15252) 5 ヶ月 前
  Jeff Bolz 2e2b22ba66 vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (#15334) 5 ヶ月 前
  Georgi Gerganov 5edf1592fd vulkan : fix out-of-bounds access in argmax kernel (#15342) 5 ヶ月 前
  Jonathan Graehl 5cdb27e091 finetune: SGD optimizer, more CLI args (#13873) 5 ヶ月 前
  Oliver Simons 6028bf7435 CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (#15132) 5 ヶ月 前
  Georgi Gerganov fd1234cb46 llama : add gpt-oss (#15091) 5 ヶ月 前
  Jeff Bolz ec0b18802c vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (#15015) 5 ヶ月 前
  Sigbjørn Skjæret 138b288b59 cuda : add softcap fusion (#14907) 5 ヶ月 前