Shawn Gu
|
81387858f1
opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602)
|
hai 3 meses |
lhez
|
0cb7a0683b
opencl: add q8_0 mm support (#16469)
|
hai 3 meses |
Aman Gupta
|
120bf7046d
CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (#16577)
|
hai 3 meses |
lhez
|
5016b72862
opencl: fix build targeting CL 2 (#16554)
|
hai 3 meses |
lhez
|
7c156df414
opencl: support pad_ext (#15888)
|
hai 4 meses |
lhez
|
d1c84a662d
opencl: support ne3 in get_rows (#15866)
|
hai 4 meses |
Sigbjørn Skjæret
|
3ecb2f671a
ggml : implement set_rows with i32 index (#16159)
|
hai 4 meses |
lhez
|
51f5a45fbe
opencl: fix concat crash on win arm64 with Adreno (#15944)
|
hai 4 meses |
lhez
|
c4510dc937
opencl: initial `q8_0` mv support (#15732)
|
hai 4 meses |
Shawn Gu
|
3edd87cd05
opencl: optimize mxfp4 kernels (#16037)
|
hai 4 meses |
Jeff Bolz
|
c0b45097c3
rename optimize_graph to graph_optimize (#16082)
|
hai 4 meses |
Jeff Bolz
|
e68aa10d8f
vulkan: sort graph to allow more parallel execution (#15850)
|
hai 5 meses |
leejet
|
0a1b3982cd
ggml: add ops for WAN video model (cuda && cpu) (#15669)
|
hai 5 meses |
rmatif
|
820bc98531
opencl: add hs=40 to FA (#15758)
|
hai 5 meses |
rmatif
|
97669e4073
opencl: add attn sinks support for FA kernels (#15706)
|
hai 5 meses |
rmatif
|
86076f92de
OpenCL: add fused group_norm/norm, mul, add (#15314)
|
hai 5 meses |
lhez
|
f7207b0415
opencl: fix support ops condition for `rms_norm` (#15560)
|
hai 5 meses |
lhez
|
fb22dd07a6
opencl: mark `argsort` unsupported if cols exceed workgroup limit (#15375)
|
hai 5 meses |
rmatif
|
912ff8c119
OpenCL: add initial FA support (#14987)
|
hai 5 meses |
lhez
|
e2c1bfff53
opencl: add initial mxfp4 support via mv (#15270)
|
hai 5 meses |
rmatif
|
60a7658810
opencl: allow mixed f16/f32 `add` (#15140)
|
hai 5 meses |
AN Long
|
cd6983d56d
ggml : fix field name when new ggml_backend (#14944)
|
hai 6 meses |
lhez
|
aaa3d07ae7
opencl: support sink in `soft_max` (attn sinks) (#15152)
|
hai 6 meses |
rmatif
|
756cfea826
fix profiling crash (#15072)
|
hai 6 meses |
lhez
|
e725a1a982
opencl: add `swiglu_oai` and `add_id` (#15121)
|
hai 6 meses |
Georgi Gerganov
|
fd1234cb46
llama : add gpt-oss (#15091)
|
hai 6 meses |
lhez
|
5c0eb5ef54
opencl: fix adreno compiler detection logic (#15029)
|
hai 6 meses |
lhez
|
1c872f71fb
opencl: add f16 for `add`, `sub`, `mul`, `div` (#14984)
|
hai 6 meses |
lhez
|
6e6725459a
opencl: add `mul_mat_f32_f32_l4_lm` and `mul_mat_f16_f32_l4_lm` (#14809)
|
hai 6 meses |
lhez
|
ce111d39d6
opencl: add fused `rms_norm_mul` (#14841)
|
hai 6 meses |