Ruben Ortlam
|
0e76501e1d
Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support (#18749)
|
2 долоо хоног өмнө |
Jeff Bolz
|
2524c26164
vulkan: fix push constant size for quantize_q8_1 (#18687)
|
3 долоо хоног өмнө |
Jeff Bolz
|
cb14b06995
vulkan: optimize ssm_scan (#18630)
|
3 долоо хоног өмнө |
Doctor Shotgun
|
9a5724dee2
ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535)
|
3 долоо хоног өмнө |
Jeff Bolz
|
ca4a8370bc
vulkan: reject ops when a tensor is too large to allocate (#18646)
|
3 долоо хоног өмнө |
virajwad
|
03023296cf
vulkan: Warptile tuning for Intel Xe2/Xe3 (#18178)
|
3 долоо хоног өмнө |
Jeff Bolz
|
ea13cba850
vulkan: support buffer_from_host_ptr (#18467)
|
3 долоо хоног өмнө |
Jeff Bolz
|
b37124d2d2
vulkan: handle quantize_q8_1 overflowing the max workgroup count (#18515)
|
3 долоо хоног өмнө |
Jeff Bolz
|
18ddaea2ae
vulkan: Optimize GGML_OP_CUMSUM (#18417)
|
4 долоо хоног өмнө |
Jeff Bolz
|
706e3f93a6
vulkan: Implement mmvq for iq1_s/iq1_m (#18450)
|
4 долоо хоног өмнө |
Jeff Bolz
|
be47fb9285
vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295)
|
1 сар өмнө |
Jeff Bolz
|
c9ced4910b
vulkan: preprocess mul_mat_id experts and discard workgroups more quickly (#18352)
|
1 сар өмнө |
Jeff Bolz
|
7ac8902133
vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader (#18349)
|
1 сар өмнө |
Jeff Bolz
|
9bf20d8ac3
vulkan: Use BK=32 for coopmat2 mul_mat_id (#18332)
|
1 сар өмнө |
Jeff Bolz
|
b96b82fc85
vulkan: Support UPSCALE w/antialias (#18327)
|
1 сар өмнө |
Jeff Bolz
|
10dc500bdb
vulkan: handle rope with large number of rows (#18306)
|
1 сар өмнө |
Jeff Bolz
|
2a9ea2020c
vulkan: fix command buffer corruption in ggml_backend_vk_event_wait (#18302)
|
1 сар өмнө |
Ruben Ortlam
|
7f459c98e7
vulkan: use fewer FA rows for small cache runs (#18280)
|
1 сар өмнө |
Jeff Bolz
|
e3b35ddf1c
vulkan: Extend rope fusions to allow mrope (#18264)
|
1 сар өмнө |
Jeff Bolz
|
e1f15b454f
vulkan: Implement set_tensor_async and the event interfaces (#18047)
|
1 сар өмнө |
Jeff Bolz
|
fd05c51cec
vulkan: fix im2col overflowing maxworkgroupcount (#18180)
|
1 сар өмнө |
Jeff Bolz
|
b365c3ff01
vulkan/cuda: fix topk_moe with exp_probs_b (#18071)
|
1 сар өмнө |
Jeff Bolz
|
cb64222b0c
vulkan: support GGML_UNARY_OP_XIELU (#18062)
|
1 сар өмнө |
Jeff Bolz
|
6eb7081860
vulkan: in graph_optimize, try to group ADD operations (#18060)
|
1 сар өмнө |
Jeff Bolz
|
cdbada8d10
vulkan: Add perf logger mode with concurrency (#17944)
|
1 сар өмнө |
Jeff Bolz
|
36255a2268
vulkan: support get_rows for i32 (#17941)
|
1 сар өмнө |
Jeff Bolz
|
3229a23fa6
vulkan: support GGML_OP_DIAG (#17893)
|
1 сар өмнө |
Jeff Bolz
|
303f8615e9
vulkan: Multi-pass softmax for large number of cols (#17892)
|
1 сар өмнө |
Jeff Bolz
|
07a10c1090
vulkan: Allow non-pow2 n_experts in topk_moe (#17872)
|
1 сар өмнө |
Jeff Bolz
|
db97837385
vulkan: perf_logger improvements (#17672)
|
1 сар өмнө |