Eve
|
b3b6d862cf
vulkan: matmul gcn tuning (#13016)
|
9 months ago |
Jeff Bolz
|
66168204be
vulkan: support noncontiguous rms_norm (#13031)
|
9 months ago |
Georgi Gerganov
|
2f74c354c0
graph : make FA compatible with MLA + add initial Metal kernels (#12953)
|
9 months ago |
Jeff Bolz
|
015022bb53
vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931)
|
9 months ago |
Diego Devesa
|
fe92821ea9
ggml : add bilinear upscale support (ggml/1185)
|
9 months ago |
Jeff Bolz
|
0090950f67
vulkan: In coopmat2 mmq, load q4_k/q5_k scales through shared memory (#12833)
|
9 months ago |
Jeff Bolz
|
80b717d493
vulkan: Use unclamped loads for flash attention mask (#12720)
|
9 months ago |
0cc4m
|
6bf28f0111
Vulkan: Tune Vulkan mmq int dot shader for performance (#12767)
|
9 months ago |
Jeff Bolz
|
74d4f5b041
vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (#12630)
|
10 months ago |
Jeff Bolz
|
f01bd02376
vulkan: Implement split_k for coopmat2 flash attention. (#12627)
|
10 months ago |
Jeff Bolz
|
be0a0f8cae
vulkan: Implement grouped query attention in the coopmat2 FA shader (#12559)
|
10 months ago |
Wagner Bruna
|
2bb3597e42
vulkan: fix build when glslc doesn't support coopmat (#12683)
|
10 months ago |
0cc4m
|
a8a1f33567
Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135)
|
10 months ago |
Georgi Gerganov
|
b4ae50810e
metal : improve FA + improve MoE (#12612)
|
10 months ago |
Jeff Bolz
|
eddfb43850
vulkan: Optimize mul_mat_vec p021 and nc shaders (#12505)
|
10 months ago |
stduhpf
|
4375415b4a
Vulkan: RTE rounding for cpy to quant (#12480)
|
10 months ago |
Jeff Bolz
|
c446b2edd2
vulkan: Submit once enough matmul work has been recorded (#12406)
|
10 months ago |
0cc4m
|
fd123cfead
Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues (#12434)
|
10 months ago |
Molly Sophia
|
7dfad387e3
llama: Add support for RWKV v7 architecture (#12412)
|
10 months ago |
Jeff Bolz
|
484a8ab513
vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (#12312)
|
10 months ago |
Daniele
|
cf2270e4d3
vulkan: subgroup size tuning (#12087)
|
10 months ago |
Jeff Bolz
|
891c63956d
vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (#12273)
|
10 months ago |
Jeff Bolz
|
2f21123c1d
vulkan: Adjust coopmat2 tile sizes and selection heuristic (#12258)
|
10 months ago |
cmdr2
|
0cbee131ad
cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)
|
11 months ago |
William Tambellini
|
70680c48e5
ggml : upgrade init_tensor API to return a ggml_status (#11854)
|
11 months ago |
Rémy O
|
438a83926a
vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (#11595)
|
11 months ago |
Jeff Bolz
|
a82c9e7c23
vulkan: fix assertion when qy_needs_dequant (#12068)
|
11 months ago |
Judd
|
c132239bfb
add OP sigmoid (#12056)
|
11 months ago |
Rémy O
|
61d4f39dfe
vulkan: implement more backpropagation operators (#11914)
|
11 months ago |
Rémy O
|
2eea03d86a
vulkan: implement several ops relevant for ggml_opt (#11769)
|
11 months ago |