duduta
|
73460f6278
ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805)
|
2 meses atrás |
Acly
|
1032256ec9
cuda/vulkan : bicubic interpolation (#17022)
|
2 meses atrás |
Ruben Ortlam
|
8a3519b708
vulkan: fix mmq out of bounds reads (#17108)
|
2 meses atrás |
Jeff Bolz
|
80a6cf6347
vulkan: fuse mul_mat_id + mul (#17095)
|
2 meses atrás |
Aman Gupta
|
64fe17fbb8
Revert "CUDA: add expert reduce kernel (#16857)" (#17100)
|
2 meses atrás |
Aman Gupta
|
c1b187688d
CUDA: skip fusion for repeating adds in bias (#17080)
|
2 meses atrás |
Jeff Bolz
|
b4e335d8dc
vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (#16977)
|
2 meses atrás |
bssrdf
|
299f5d782c
CUDA: properly handle nb00=nb02 case for cpy (#17081)
|
2 meses atrás |
Johannes Gäßler
|
aa374175c3
CUDA: fix crash on uneven context without FA (#16988)
|
2 meses atrás |
bssrdf
|
230d1169e5
improve CUDA cpy memory bandwidth when copying transposed tensor (#16841)
|
2 meses atrás |
Shagun Bera
|
a2054e3a8f
test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (#16936)
|
2 meses atrás |
Georgi Gerganov
|
2f966b8ed8
clip : use FA (#16837)
|
2 meses atrás |
Aman Gupta
|
4146d6a1a6
CUDA: add expert reduce kernel (#16857)
|
2 meses atrás |
Ruben Ortlam
|
d2a2673dd1
vulkan: fix shmem overrun in mmq id shader (#16873)
|
2 meses atrás |
JJJYmmm
|
d261223d24
model: add support for qwen3vl series (#16780)
|
2 meses atrás |
Sigbjørn Skjæret
|
229bf68628
cuda : fix argsort with 64k+ rows (#16849)
|
2 meses atrás |
Jeff Bolz
|
b9ce940177
vulkan: Fuse rope+set_rows (#16769)
|
2 meses atrás |
Acly
|
10640e31aa
ggml : fix interpolate with align-corners and ne=1 (#16700)
|
2 meses atrás |
Aman Gupta
|
75cbdd3fce
test-backend-ops: print failed tests at the end (#16785)
|
2 meses atrás |
Aman Gupta
|
75d33b9302
CUDA: support for weight clamp in top-k norm (#16702)
|
2 meses atrás |
leejet
|
bbac6a26b2
ggml: fix cuda kernel launch configuration for k_compute_batched_ptrs to support large batch (#16744)
|
2 meses atrás |
Aman Gupta
|
f77c13b91f
CUDA: General GEMV fusion (#16715)
|
2 meses atrás |
leejet
|
55945d2ef5
ggml: fix CUDA grid launch condition for large block_nums.y in binbcast (#16742)
|
2 meses atrás |
Aman Gupta
|
03792ad936
CUDA: topk-moe: add optional parameter for gpt-oss (#16649)
|
2 meses atrás |
safranowith
|
2330de7b84
SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (#16613)
|
2 meses atrás |
Ilia Ilmer
|
9ad4f1931e
metal : add `CONV_TRANSPOSE_2D` (#16542)
|
3 meses atrás |
lhez
|
0cb7a0683b
opencl: add q8_0 mm support (#16469)
|
3 meses atrás |
Sam/Samuel
|
f4ce81c45e
metal: optimise `GGML_OP_SUM` (#16559)
|
3 meses atrás |
Aman Gupta
|
48e2fa9fb7
CUDA: add fp kernel for larger batch size MoE (#16512)
|
3 meses atrás |
Georgi Gerganov
|
e60f241eac
metal : FA support F32 K and V and head size = 32 (#16531)
|
3 meses atrás |