Oliver Simons
|
36f0132464
CUDA: Factor out and re-use `block_reduce` function (#18785)
|
1 week ago |
Jeff Bolz
|
2bbe4c2cf8
vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id) (#18678)
|
2 weeks ago |
Aman Gupta
|
b137718878
test-backend-ops: fix mxfp4 tests on blackwell (#18736)
|
2 weeks ago |
Jeff Bolz
|
f1768d8f03
vulkan: fix topk_moe_sigmoid_norm_bias failures in GLM-4.6 (#18582)
|
3 weeks ago |
Jeff Bolz
|
b37124d2d2
vulkan: handle quantize_q8_1 overflowing the max workgroup count (#18515)
|
3 weeks ago |
Chenguang Li
|
67e3f6f601
CANN: add operator fusion support for ADD + RMS_NORM (#17512)
|
3 weeks ago |
Daniel Bevenius
|
d3dce4e0a5
sampling : add support for backend sampling (#17004)
|
3 weeks ago |
Jeff Bolz
|
be47fb9285
vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295)
|
3 weeks ago |
Jeff Bolz
|
b96b82fc85
vulkan: Support UPSCALE w/antialias (#18327)
|
1 month ago |
Jeff Bolz
|
10dc500bdb
vulkan: handle rope with large number of rows (#18306)
|
1 month ago |
Jeff Bolz
|
e3b35ddf1c
vulkan: Extend rope fusions to allow mrope (#18264)
|
1 month ago |
Jeff Bolz
|
fd05c51cec
vulkan: fix im2col overflowing maxworkgroupcount (#18180)
|
1 month ago |
Jeff Bolz
|
b365c3ff01
vulkan/cuda: fix topk_moe with exp_probs_b (#18071)
|
1 month ago |
Jeff Bolz
|
52ab19df63
tests: Avoid floating point precision false positives in SUM (#17471)
|
1 month ago |
Jeff Bolz
|
5182dd64cd
test-backend-ops: improve msvc build time (#18209)
|
1 month ago |
Xuan-Son Nguyen
|
8ea958d4d9
model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106)
|
1 month ago |
Jeff Bolz
|
303f8615e9
vulkan: Multi-pass softmax for large number of cols (#17892)
|
1 month ago |
Jeff Bolz
|
07a10c1090
vulkan: Allow non-pow2 n_experts in topk_moe (#17872)
|
1 month ago |
Piotr Wilkin (ilintar)
|
53ecd4fdb9
SOLVE_TRI extension to more dimensions (#17793)
|
1 month ago |
Georgi Gerganov
|
4dff236a52
ggml : remove GGML_KQ_MASK_PAD constant (#17910)
|
1 month ago |
Gabe Goodhart
|
086a63e3a5
metal: SSM kernel improvements (#17876)
|
1 month ago |
Piotr Wilkin (ilintar)
|
b63509262a
Add DIAG for CUDA (#17873)
|
1 month ago |
Phylliida Dev
|
09c7c50e64
ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (#16985)
|
1 month ago |
Jeff Bolz
|
c6c5e85979
vulkan: support solve_tri with larger N/K values (#17781)
|
1 month ago |
Jeff Bolz
|
a0f3897d53
vulkan: fix top_k bug when there are ties in the input (#17659)
|
1 month ago |
Acly
|
e15cd06a94
vulkan : support conv-2d with large output size (#17685)
|
1 month ago |
Piotr Wilkin (ilintar)
|
96fe9badfc
Add support for CUMSUM and TRI for CUDA. (#17584)
|
1 month ago |
Reese Levine
|
7ca5991d2b
ggml webgpu: add support for emscripten builds (#17184)
|
1 month ago |
Tarek Dakhran
|
2ba719519d
model: LFM2-VL fixes (#17577)
|
1 month ago |
Jeff Bolz
|
59d8d4e963
vulkan: improve topk perf for large k, fix overflow in unit tests (#17582)
|
2 months ago |