Reese Levine
|
7ca5991d2b
ggml webgpu: add support for emscripten builds (#17184)
|
1 месяц назад |
Tarek Dakhran
|
2ba719519d
model: LFM2-VL fixes (#17577)
|
1 месяц назад |
Jeff Bolz
|
59d8d4e963
vulkan: improve topk perf for large k, fix overflow in unit tests (#17582)
|
1 месяц назад |
Piotr Wilkin (ilintar)
|
cd0e3a7a3b
SOLVE_TRI CUDA kernel for small matrices (#17457)
|
1 месяц назад |
Jeff Bolz
|
879d673759
vulkan: Implement top-k (#17418)
|
1 месяц назад |
Georgi Gerganov
|
583cb83416
ggml : add ggml_top_k (#17365)
|
1 месяц назад |
Jeff Bolz
|
d414db02d3
vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 (#17455)
|
1 месяц назад |
Sigbjørn Skjæret
|
96ac5a2329
cuda : support non-contiguous i32 to i32 copy (#17326)
|
1 месяц назад |
Masato Nakasaka
|
3f3a4fb9c3
Revive MUL_MAT_ID to perf testing (#17397)
|
1 месяц назад |
Giuseppe Scrivano
|
7d77f07325
vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, TRUNC (#17319)
|
1 месяц назад |
Jeff Bolz
|
1fa4551af0
vulkan: support larger argsort (#17313)
|
1 месяц назад |
Piotr Wilkin (ilintar)
|
6fd4f95367
Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition (#17332)
|
1 месяц назад |
Georgi Gerganov
|
1a139644a8
metal : add cumsum (#17305)
|
2 месяцев назад |
Jeff Bolz
|
24dc769f1b
vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (#17287)
|
2 месяцев назад |
Georgi Gerganov
|
45c6ef7307
metal : support argsort for ne00 > 1024 (#17247)
|
2 месяцев назад |
Piotr Wilkin (ilintar)
|
389ac78b26
ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (#17063)
|
2 месяцев назад |
Diego Devesa
|
879dec341a
ggml-cpu : use template for argsort (#17222)
|
2 месяцев назад |
duduta
|
73460f6278
ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805)
|
2 месяцев назад |
Acly
|
1032256ec9
cuda/vulkan : bicubic interpolation (#17022)
|
2 месяцев назад |
Ruben Ortlam
|
8a3519b708
vulkan: fix mmq out of bounds reads (#17108)
|
2 месяцев назад |
Jeff Bolz
|
80a6cf6347
vulkan: fuse mul_mat_id + mul (#17095)
|
2 месяцев назад |
Aman Gupta
|
64fe17fbb8
Revert "CUDA: add expert reduce kernel (#16857)" (#17100)
|
2 месяцев назад |
Aman Gupta
|
c1b187688d
CUDA: skip fusion for repeating adds in bias (#17080)
|
2 месяцев назад |
Jeff Bolz
|
b4e335d8dc
vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (#16977)
|
2 месяцев назад |
bssrdf
|
299f5d782c
CUDA: properly handle nb00=nb02 case for cpy (#17081)
|
2 месяцев назад |
Johannes Gäßler
|
aa374175c3
CUDA: fix crash on uneven context without FA (#16988)
|
2 месяцев назад |
bssrdf
|
230d1169e5
improve CUDA cpy memory bandwidth when copying transposed tensor (#16841)
|
2 месяцев назад |
Shagun Bera
|
a2054e3a8f
test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (#16936)
|
2 месяцев назад |
Georgi Gerganov
|
2f966b8ed8
clip : use FA (#16837)
|
2 месяцев назад |
Aman Gupta
|
4146d6a1a6
CUDA: add expert reduce kernel (#16857)
|
2 месяцев назад |