Jeff Bolz
|
52ab19df63
tests: Avoid floating point precision false positives in SUM (#17471)
|
3 settimane fa |
Jeff Bolz
|
5182dd64cd
test-backend-ops: improve msvc build time (#18209)
|
3 settimane fa |
Xuan-Son Nguyen
|
8ea958d4d9
model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106)
|
4 settimane fa |
Jeff Bolz
|
303f8615e9
vulkan: Multi-pass softmax for large number of cols (#17892)
|
1 mese fa |
Jeff Bolz
|
07a10c1090
vulkan: Allow non-pow2 n_experts in topk_moe (#17872)
|
1 mese fa |
Piotr Wilkin (ilintar)
|
53ecd4fdb9
SOLVE_TRI extension to more dimensions (#17793)
|
1 mese fa |
Georgi Gerganov
|
4dff236a52
ggml : remove GGML_KQ_MASK_PAD constant (#17910)
|
1 mese fa |
Gabe Goodhart
|
086a63e3a5
metal: SSM kernel improvements (#17876)
|
1 mese fa |
Piotr Wilkin (ilintar)
|
b63509262a
Add DIAG for CUDA (#17873)
|
1 mese fa |
Phylliida Dev
|
09c7c50e64
ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (#16985)
|
1 mese fa |
Jeff Bolz
|
c6c5e85979
vulkan: support solve_tri with larger N/K values (#17781)
|
1 mese fa |
Jeff Bolz
|
a0f3897d53
vulkan: fix top_k bug when there are ties in the input (#17659)
|
1 mese fa |
Acly
|
e15cd06a94
vulkan : support conv-2d with large output size (#17685)
|
1 mese fa |
Piotr Wilkin (ilintar)
|
96fe9badfc
Add support for CUMSUM and TRI for CUDA. (#17584)
|
1 mese fa |
Reese Levine
|
7ca5991d2b
ggml webgpu: add support for emscripten builds (#17184)
|
1 mese fa |
Tarek Dakhran
|
2ba719519d
model: LFM2-VL fixes (#17577)
|
1 mese fa |
Jeff Bolz
|
59d8d4e963
vulkan: improve topk perf for large k, fix overflow in unit tests (#17582)
|
1 mese fa |
Piotr Wilkin (ilintar)
|
cd0e3a7a3b
SOLVE_TRI CUDA kernel for small matrices (#17457)
|
1 mese fa |
Jeff Bolz
|
879d673759
vulkan: Implement top-k (#17418)
|
1 mese fa |
Georgi Gerganov
|
583cb83416
ggml : add ggml_top_k (#17365)
|
1 mese fa |
Jeff Bolz
|
d414db02d3
vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 (#17455)
|
1 mese fa |
Sigbjørn Skjæret
|
96ac5a2329
cuda : support non-contiguous i32 to i32 copy (#17326)
|
1 mese fa |
Masato Nakasaka
|
3f3a4fb9c3
Revive MUL_MAT_ID to perf testing (#17397)
|
1 mese fa |
Giuseppe Scrivano
|
7d77f07325
vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, TRUNC (#17319)
|
1 mese fa |
Jeff Bolz
|
1fa4551af0
vulkan: support larger argsort (#17313)
|
1 mese fa |
Piotr Wilkin (ilintar)
|
6fd4f95367
Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition (#17332)
|
1 mese fa |
Georgi Gerganov
|
1a139644a8
metal : add cumsum (#17305)
|
2 mesi fa |
Jeff Bolz
|
24dc769f1b
vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (#17287)
|
2 mesi fa |
Georgi Gerganov
|
45c6ef7307
metal : support argsort for ne00 > 1024 (#17247)
|
2 mesi fa |
Piotr Wilkin (ilintar)
|
389ac78b26
ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (#17063)
|
2 mesi fa |