Commit History

Autor SHA1 Mensaxe Data
  Shawn Gu 81387858f1 opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602) hai 3 meses
  lhez 0cb7a0683b opencl: add q8_0 mm support (#16469) hai 3 meses
  Aman Gupta 120bf7046d CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (#16577) hai 3 meses
  lhez 5016b72862 opencl: fix build targeting CL 2 (#16554) hai 3 meses
  lhez 7c156df414 opencl: support pad_ext (#15888) hai 4 meses
  lhez d1c84a662d opencl: support ne3 in get_rows (#15866) hai 4 meses
  Sigbjørn Skjæret 3ecb2f671a ggml : implement set_rows with i32 index (#16159) hai 4 meses
  lhez 51f5a45fbe opencl: fix concat crash on win arm64 with Adreno (#15944) hai 4 meses
  lhez c4510dc937 opencl: initial `q8_0` mv support (#15732) hai 4 meses
  Shawn Gu 3edd87cd05 opencl: optimize mxfp4 kernels (#16037) hai 4 meses
  Jeff Bolz c0b45097c3 rename optimize_graph to graph_optimize (#16082) hai 4 meses
  Jeff Bolz e68aa10d8f vulkan: sort graph to allow more parallel execution (#15850) hai 5 meses
  leejet 0a1b3982cd ggml: add ops for WAN video model (cuda && cpu) (#15669) hai 5 meses
  rmatif 820bc98531 opencl: add hs=40 to FA (#15758) hai 5 meses
  rmatif 97669e4073 opencl: add attn sinks support for FA kernels (#15706) hai 5 meses
  rmatif 86076f92de OpenCL: add fused group_norm/norm, mul, add (#15314) hai 5 meses
  lhez f7207b0415 opencl: fix support ops condition for `rms_norm` (#15560) hai 5 meses
  lhez fb22dd07a6 opencl: mark `argsort` unsupported if cols exceed workgroup limit (#15375) hai 5 meses
  rmatif 912ff8c119 OpenCL: add initial FA support (#14987) hai 5 meses
  lhez e2c1bfff53 opencl: add initial mxfp4 support via mv (#15270) hai 5 meses
  rmatif 60a7658810 opencl: allow mixed f16/f32 `add` (#15140) hai 5 meses
  AN Long cd6983d56d ggml : fix field name when new ggml_backend (#14944) hai 6 meses
  lhez aaa3d07ae7 opencl: support sink in `soft_max` (attn sinks) (#15152) hai 6 meses
  rmatif 756cfea826 fix profiling crash (#15072) hai 6 meses
  lhez e725a1a982 opencl: add `swiglu_oai` and `add_id` (#15121) hai 6 meses
  Georgi Gerganov fd1234cb46 llama : add gpt-oss (#15091) hai 6 meses
  lhez 5c0eb5ef54 opencl: fix adreno compiler detection logic (#15029) hai 6 meses
  lhez 1c872f71fb opencl: add f16 for `add`, `sub`, `mul`, `div` (#14984) hai 6 meses
  lhez 6e6725459a opencl: add `mul_mat_f32_f32_l4_lm` and `mul_mat_f16_f32_l4_lm` (#14809) hai 6 meses
  lhez ce111d39d6 opencl: add fused `rms_norm_mul` (#14841) hai 6 meses