Giuseppe Scrivano
|
7d77f07325
vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, TRUNC (#17319)
|
1 bulan lalu |
Jeff Bolz
|
1fa4551af0
vulkan: support larger argsort (#17313)
|
1 bulan lalu |
Piotr Wilkin (ilintar)
|
6fd4f95367
Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition (#17332)
|
1 bulan lalu |
Georgi Gerganov
|
1a139644a8
metal : add cumsum (#17305)
|
2 bulan lalu |
Jeff Bolz
|
24dc769f1b
vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (#17287)
|
2 bulan lalu |
Georgi Gerganov
|
45c6ef7307
metal : support argsort for ne00 > 1024 (#17247)
|
2 bulan lalu |
Piotr Wilkin (ilintar)
|
389ac78b26
ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (#17063)
|
2 bulan lalu |
Diego Devesa
|
879dec341a
ggml-cpu : use template for argsort (#17222)
|
2 bulan lalu |
duduta
|
73460f6278
ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805)
|
2 bulan lalu |
Acly
|
1032256ec9
cuda/vulkan : bicubic interpolation (#17022)
|
2 bulan lalu |
Ruben Ortlam
|
8a3519b708
vulkan: fix mmq out of bounds reads (#17108)
|
2 bulan lalu |
Jeff Bolz
|
80a6cf6347
vulkan: fuse mul_mat_id + mul (#17095)
|
2 bulan lalu |
Aman Gupta
|
64fe17fbb8
Revert "CUDA: add expert reduce kernel (#16857)" (#17100)
|
2 bulan lalu |
Aman Gupta
|
c1b187688d
CUDA: skip fusion for repeating adds in bias (#17080)
|
2 bulan lalu |
Jeff Bolz
|
b4e335d8dc
vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (#16977)
|
2 bulan lalu |
bssrdf
|
299f5d782c
CUDA: properly handle nb00=nb02 case for cpy (#17081)
|
2 bulan lalu |
Johannes Gäßler
|
aa374175c3
CUDA: fix crash on uneven context without FA (#16988)
|
2 bulan lalu |
bssrdf
|
230d1169e5
improve CUDA cpy memory bandwidth when copying transposed tensor (#16841)
|
2 bulan lalu |
Shagun Bera
|
a2054e3a8f
test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (#16936)
|
2 bulan lalu |
Georgi Gerganov
|
2f966b8ed8
clip : use FA (#16837)
|
2 bulan lalu |
Aman Gupta
|
4146d6a1a6
CUDA: add expert reduce kernel (#16857)
|
2 bulan lalu |
Ruben Ortlam
|
d2a2673dd1
vulkan: fix shmem overrun in mmq id shader (#16873)
|
2 bulan lalu |
JJJYmmm
|
d261223d24
model: add support for qwen3vl series (#16780)
|
2 bulan lalu |
Sigbjørn Skjæret
|
229bf68628
cuda : fix argsort with 64k+ rows (#16849)
|
2 bulan lalu |
Jeff Bolz
|
b9ce940177
vulkan: Fuse rope+set_rows (#16769)
|
2 bulan lalu |
Acly
|
10640e31aa
ggml : fix interpolate with align-corners and ne=1 (#16700)
|
2 bulan lalu |
Aman Gupta
|
75cbdd3fce
test-backend-ops: print failed tests at the end (#16785)
|
2 bulan lalu |
Aman Gupta
|
75d33b9302
CUDA: support for weight clamp in top-k norm (#16702)
|
2 bulan lalu |
leejet
|
bbac6a26b2
ggml: fix cuda kernel launch configuration for k_compute_batched_ptrs to support large batch (#16744)
|
2 bulan lalu |
Aman Gupta
|
f77c13b91f
CUDA: General GEMV fusion (#16715)
|
2 bulan lalu |