Jeff Bolz
|
59d8d4e963
vulkan: improve topk perf for large k, fix overflow in unit tests (#17582)
|
há 1 mês atrás |
Jeff Bolz
|
35cf8887e1
vulkan: Implement GGML_OP_TRI (#17503)
|
há 1 mês atrás |
Jeff Bolz
|
4abef75f2c
vulkan: Implement SOLVE_TRI (#17486)
|
há 1 mês atrás |
Acly
|
b78db3bd50
vulkan : move contiguous checks to device_supports_op (#17490)
|
há 1 mês atrás |
Jeff Bolz
|
142df17c9c
vulkan: use a fixed 1KB buffer for the add_rms_fusion opt (#17514)
|
há 1 mês atrás |
Jeff Bolz
|
eec1e33a9e
vulkan: allow graph_optimize for prompt processing workloads (#17475)
|
há 1 mês atrás |
Jeff Bolz
|
879d673759
vulkan: Implement top-k (#17418)
|
há 1 mês atrás |
Jeff Bolz
|
b3b03a7baf
vulkan: Implement GGML_OP_CUMSUM (#17479)
|
há 1 mês atrás |
Jeff Bolz
|
d414db02d3
vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 (#17455)
|
há 1 mês atrás |
Jeff Bolz
|
3d07caa99b
vulkan: more FA details in vk_perf_logger (#17443)
|
há 2 meses atrás |
Jeff Bolz
|
54d83bbe85
vulkan: remove a couple unnecessary switches (#17419)
|
há 2 meses atrás |
Jeff Bolz
|
f1ffbba68e
vulkan: disable async for older Intel devices (#17369)
|
há 2 meses atrás |
Giuseppe Scrivano
|
7d77f07325
vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, TRUNC (#17319)
|
há 2 meses atrás |
Jeff Bolz
|
1fa4551af0
vulkan: support larger argsort (#17313)
|
há 2 meses atrás |
Jeff Bolz
|
2eba631b81
vulkan: Add copy_transpose shader (#17371)
|
há 2 meses atrás |
Ruben Ortlam
|
980b7cd17e
vulkan: force full subgroups for flash attention to fix intel subgroup crash (#17356)
|
há 2 meses atrás |
Jeff Bolz
|
da95bf2a85
vulkan: support noncontig i32 copy (#17328)
|
há 2 meses atrás |
Ruben Ortlam
|
38e2c1b412
vulkan: add log RTE support to fix Nvidia CI (#17320)
|
há 2 meses atrás |
Pavels Zaicenkovs
|
dbed61294a
vulkan: add LOG operation support for F32 and F16 (#17183)
|
há 2 meses atrás |
Ruben Ortlam
|
80deff3648
vulkan: fix MMQ quantize_y condition (#17301)
|
há 2 meses atrás |
Jeff Bolz
|
24dc769f1b
vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (#17287)
|
há 2 meses atrás |
Giuseppe Scrivano
|
1568d13c2c
vulkan: implement ABS and NEG (#17245)
|
há 2 meses atrás |
Jeff Bolz
|
439342ea0b
vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths (#17244)
|
há 2 meses atrás |
Jeff Bolz
|
234ae7d7bd
vulkan: skip all-negative-inf blocks in FA (#17186)
|
há 2 meses atrás |
Jeff Bolz
|
38eaf32af1
vulkan: change graph_compute to be async and enable get_tensor_async (#17158)
|
há 2 meses atrás |
Eve
|
7d019cff74
disable rms norm mul rope for chips with no fp16 rte (#17134)
|
há 2 meses atrás |
Ruben Ortlam
|
85234a4b3a
vulkan: fix validation issue introduced by #16868 (#17145)
|
há 2 meses atrás |
Acly
|
1032256ec9
cuda/vulkan : bicubic interpolation (#17022)
|
há 2 meses atrás |
Ruben Ortlam
|
392e09a608
vulkan: fix memory allocations (#17122)
|
há 2 meses atrás |
Ruben Ortlam
|
7f3e9d339c
vulkan: iGPU memory reporting fix (#17110)
|
há 2 meses atrás |