Jeff Bolz
|
b4e335d8dc
vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (#16977)
|
2 months ago |
bssrdf
|
299f5d782c
CUDA: properly handle nb00=nb02 case for cpy (#17081)
|
2 months ago |
Johannes Gäßler
|
aa374175c3
CUDA: fix crash on uneven context without FA (#16988)
|
2 months ago |
bssrdf
|
230d1169e5
improve CUDA cpy memory bandwidth when copying transposed tensor (#16841)
|
2 months ago |
Shagun Bera
|
a2054e3a8f
test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (#16936)
|
2 months ago |
Georgi Gerganov
|
2f966b8ed8
clip : use FA (#16837)
|
2 months ago |
Aman Gupta
|
4146d6a1a6
CUDA: add expert reduce kernel (#16857)
|
2 months ago |
Ruben Ortlam
|
d2a2673dd1
vulkan: fix shmem overrun in mmq id shader (#16873)
|
2 months ago |
JJJYmmm
|
d261223d24
model: add support for qwen3vl series (#16780)
|
2 months ago |
Sigbjørn Skjæret
|
229bf68628
cuda : fix argsort with 64k+ rows (#16849)
|
2 months ago |
Jeff Bolz
|
b9ce940177
vulkan: Fuse rope+set_rows (#16769)
|
2 months ago |
Acly
|
10640e31aa
ggml : fix interpolate with align-corners and ne=1 (#16700)
|
2 months ago |
Aman Gupta
|
75cbdd3fce
test-backend-ops: print failed tests at the end (#16785)
|
2 months ago |
Aman Gupta
|
75d33b9302
CUDA: support for weight clamp in top-k norm (#16702)
|
2 months ago |
leejet
|
bbac6a26b2
ggml: fix cuda kernel launch configuration for k_compute_batched_ptrs to support large batch (#16744)
|
2 months ago |
Aman Gupta
|
f77c13b91f
CUDA: General GEMV fusion (#16715)
|
2 months ago |
leejet
|
55945d2ef5
ggml: fix CUDA grid launch condition for large block_nums.y in binbcast (#16742)
|
2 months ago |
Aman Gupta
|
03792ad936
CUDA: topk-moe: add optional parameter for gpt-oss (#16649)
|
2 months ago |
safranowith
|
2330de7b84
SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (#16613)
|
2 months ago |
Ilia Ilmer
|
9ad4f1931e
metal : add `CONV_TRANSPOSE_2D` (#16542)
|
3 months ago |
lhez
|
0cb7a0683b
opencl: add q8_0 mm support (#16469)
|
3 months ago |
Sam/Samuel
|
f4ce81c45e
metal: optimise `GGML_OP_SUM` (#16559)
|
3 months ago |
Aman Gupta
|
48e2fa9fb7
CUDA: add fp kernel for larger batch size MoE (#16512)
|
3 months ago |
Georgi Gerganov
|
e60f241eac
metal : FA support F32 K and V and head size = 32 (#16531)
|
3 months ago |
Georgi Gerganov
|
0a319bb75e
metal : add support for non-padded FA KV (#16148)
|
3 months ago |
Georgi Gerganov
|
1d6092fc72
tests : add -INF blocks to the KQ mask in the FA tests (#16380)
|
3 months ago |
Reese Levine
|
ef07a40906
ggml webgpu: add support for soft_max, optimize rms_norm (#16357)
|
3 months ago |
Reese Levine
|
8d78cd2613
ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187)
|
3 months ago |
Jeff Bolz
|
a74a0d69f3
tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences (#16295)
|
3 months ago |
Sigbjørn Skjæret
|
adc76347d7
ggml : check cuda and metal argsort limits and add test (#16323)
|
3 months ago |