Commit History

Author SHA1 Message Date
  Oliver Simons 36f0132464 CUDA: Factor out and re-use `block_reduce` function (#18785) 1 week ago
  Jeff Bolz 2bbe4c2cf8 vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id) (#18678) 2 weeks ago
  Aman Gupta b137718878 test-backend-ops: fix mxfp4 tests on blackwell (#18736) 2 weeks ago
  Jeff Bolz f1768d8f03 vulkan: fix topk_moe_sigmoid_norm_bias failures in GLM-4.6 (#18582) 3 weeks ago
  Jeff Bolz b37124d2d2 vulkan: handle quantize_q8_1 overflowing the max workgroup count (#18515) 3 weeks ago
  Chenguang Li 67e3f6f601 CANN: add operator fusion support for ADD + RMS_NORM (#17512) 3 weeks ago
  Daniel Bevenius d3dce4e0a5 sampling : add support for backend sampling (#17004) 3 weeks ago
  Jeff Bolz be47fb9285 vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295) 3 weeks ago
  Jeff Bolz b96b82fc85 vulkan: Support UPSCALE w/antialias (#18327) 1 month ago
  Jeff Bolz 10dc500bdb vulkan: handle rope with large number of rows (#18306) 1 month ago
  Jeff Bolz e3b35ddf1c vulkan: Extend rope fusions to allow mrope (#18264) 1 month ago
  Jeff Bolz fd05c51cec vulkan: fix im2col overflowing maxworkgroupcount (#18180) 1 month ago
  Jeff Bolz b365c3ff01 vulkan/cuda: fix topk_moe with exp_probs_b (#18071) 1 month ago
  Jeff Bolz 52ab19df63 tests: Avoid floating point precision false positives in SUM (#17471) 1 month ago
  Jeff Bolz 5182dd64cd test-backend-ops: improve msvc build time (#18209) 1 month ago
  Xuan-Son Nguyen 8ea958d4d9 model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106) 1 month ago
  Jeff Bolz 303f8615e9 vulkan: Multi-pass softmax for large number of cols (#17892) 1 month ago
  Jeff Bolz 07a10c1090 vulkan: Allow non-pow2 n_experts in topk_moe (#17872) 1 month ago
  Piotr Wilkin (ilintar) 53ecd4fdb9 SOLVE_TRI extension to more dimensions (#17793) 1 month ago
  Georgi Gerganov 4dff236a52 ggml : remove GGML_KQ_MASK_PAD constant (#17910) 1 month ago
  Gabe Goodhart 086a63e3a5 metal: SSM kernel improvements (#17876) 1 month ago
  Piotr Wilkin (ilintar) b63509262a Add DIAG for CUDA (#17873) 1 month ago
  Phylliida Dev 09c7c50e64 ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (#16985) 1 month ago
  Jeff Bolz c6c5e85979 vulkan: support solve_tri with larger N/K values (#17781) 1 month ago
  Jeff Bolz a0f3897d53 vulkan: fix top_k bug when there are ties in the input (#17659) 1 month ago
  Acly e15cd06a94 vulkan : support conv-2d with large output size (#17685) 1 month ago
  Piotr Wilkin (ilintar) 96fe9badfc Add support for CUMSUM and TRI for CUDA. (#17584) 1 month ago
  Reese Levine 7ca5991d2b ggml webgpu: add support for emscripten builds (#17184) 1 month ago
  Tarek Dakhran 2ba719519d model: LFM2-VL fixes (#17577) 1 month ago
  Jeff Bolz 59d8d4e963 vulkan: improve topk perf for large k, fix overflow in unit tests (#17582) 2 months ago