Commit Verlauf

Autor SHA1 Nachricht Datum
  Aman Gupta 64fe17fbb8 Revert "CUDA: add expert reduce kernel (#16857)" (#17100) vor 2 Monaten
  Aman Gupta c1b187688d CUDA: skip fusion for repeating adds in bias (#17080) vor 2 Monaten
  Jeff Bolz b4e335d8dc vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (#16977) vor 2 Monaten
  bssrdf 299f5d782c CUDA: properly handle nb00=nb02 case for cpy (#17081) vor 2 Monaten
  Johannes Gäßler aa374175c3 CUDA: fix crash on uneven context without FA (#16988) vor 2 Monaten
  bssrdf 230d1169e5 improve CUDA cpy memory bandwidth when copying transposed tensor (#16841) vor 2 Monaten
  Shagun Bera a2054e3a8f test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (#16936) vor 2 Monaten
  Georgi Gerganov 2f966b8ed8 clip : use FA (#16837) vor 2 Monaten
  Aman Gupta 4146d6a1a6 CUDA: add expert reduce kernel (#16857) vor 2 Monaten
  Ruben Ortlam d2a2673dd1 vulkan: fix shmem overrun in mmq id shader (#16873) vor 2 Monaten
  JJJYmmm d261223d24 model: add support for qwen3vl series (#16780) vor 2 Monaten
  Sigbjørn Skjæret 229bf68628 cuda : fix argsort with 64k+ rows (#16849) vor 2 Monaten
  Jeff Bolz b9ce940177 vulkan: Fuse rope+set_rows (#16769) vor 2 Monaten
  Acly 10640e31aa ggml : fix interpolate with align-corners and ne=1 (#16700) vor 2 Monaten
  Aman Gupta 75cbdd3fce test-backend-ops: print failed tests at the end (#16785) vor 2 Monaten
  Aman Gupta 75d33b9302 CUDA: support for weight clamp in top-k norm (#16702) vor 2 Monaten
  leejet bbac6a26b2 ggml: fix cuda kernel launch configuration for k_compute_batched_ptrs to support large batch (#16744) vor 2 Monaten
  Aman Gupta f77c13b91f CUDA: General GEMV fusion (#16715) vor 2 Monaten
  leejet 55945d2ef5 ggml: fix CUDA grid launch condition for large block_nums.y in binbcast (#16742) vor 2 Monaten
  Aman Gupta 03792ad936 CUDA: topk-moe: add optional parameter for gpt-oss (#16649) vor 2 Monaten
  safranowith 2330de7b84 SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (#16613) vor 3 Monaten
  Ilia Ilmer 9ad4f1931e metal : add `CONV_TRANSPOSE_2D` (#16542) vor 3 Monaten
  lhez 0cb7a0683b opencl: add q8_0 mm support (#16469) vor 3 Monaten
  Sam/Samuel f4ce81c45e metal: optimise `GGML_OP_SUM` (#16559) vor 3 Monaten
  Aman Gupta 48e2fa9fb7 CUDA: add fp kernel for larger batch size MoE (#16512) vor 3 Monaten
  Georgi Gerganov e60f241eac metal : FA support F32 K and V and head size = 32 (#16531) vor 3 Monaten
  Georgi Gerganov 0a319bb75e metal : add support for non-padded FA KV (#16148) vor 3 Monaten
  Georgi Gerganov 1d6092fc72 tests : add -INF blocks to the KQ mask in the FA tests (#16380) vor 3 Monaten
  Reese Levine ef07a40906 ggml webgpu: add support for soft_max, optimize rms_norm (#16357) vor 3 Monaten
  Reese Levine 8d78cd2613 ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187) vor 3 Monaten