Jeff Bolz
|
6a746cf9c4
vulkan: Split large mul_mat_id to fit in shared memory (#14451)
|
6 달 전 |
Acly
|
431b2c24f3
ggml-cpu : "align corners" for bilinear upscale/downscale (ggml/1285)
|
6 달 전 |
Diego Devesa
|
eb3fa2913e
test-backend-ops : disable llama test (#14461)
|
6 달 전 |
Sigbjørn Skjæret
|
a0535ffa0d
ggml : implement REGLU/GEGLU/SWIGLU ops (#14158)
|
6 달 전 |
Jeff Bolz
|
bd9c981d72
vulkan: Add fusion support for RMS_NORM+MUL (#14366)
|
6 달 전 |
Aman Gupta
|
27208bf657
CUDA: add bf16 and f32 support to cublas_mul_mat_batched (#14361)
|
6 달 전 |
Radoslav Gerganov
|
8d94219a4a
ggml : add ggml_set_rows (#14274)
|
6 달 전 |
Georgi Gerganov
|
e8215dbb96
metal : add special-case mat-vec mul for ne00 == 4 (#14385)
|
6 달 전 |
Aman Gupta
|
aa064b2eb7
CUDA: add mean operation (#14313)
|
6 달 전 |
Aman Gupta
|
c959f462a0
CUDA: add conv_2d_transpose (#14287)
|
7 달 전 |
Ervin Áron Tasnádi
|
0d3984424f
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813)
|
7 달 전 |
Johannes Gäßler
|
10d2af0eaa
llama/ggml: add LLM training support (#10544)
|
8 달 전 |
Georgi Gerganov
|
b34443923c
sync : ggml (#13268)
|
8 달 전 |
Johannes Gäßler
|
b0ecbd434b
test: non-cont. b in test-backend-ops -o MUL_MAT (#13187)
|
8 달 전 |
Johannes Gäßler
|
e1e8e0991f
CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199)
|
8 달 전 |
Xuan-Son Nguyen
|
edb18b6e8f
clip : fix pixtral on some GPU backends (#13097)
|
8 달 전 |
Johannes Gäßler
|
658987cfc9
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (#13014)
|
9 달 전 |
Georgi Gerganov
|
2f74c354c0
graph : make FA compatible with MLA + add initial Metal kernels (#12953)
|
9 달 전 |
Jeff Bolz
|
015022bb53
vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931)
|
9 달 전 |
Georgi Gerganov
|
1d2b613445
tests : fix init order (#0)
|
9 달 전 |
Diego Devesa
|
fe92821ea9
ggml : add bilinear upscale support (ggml/1185)
|
9 달 전 |
Jeff Bolz
|
f01bd02376
vulkan: Implement split_k for coopmat2 flash attention. (#12627)
|
9 달 전 |
Georgi Gerganov
|
b4ae50810e
metal : improve FA + improve MoE (#12612)
|
9 달 전 |
Jeff Bolz
|
9b169a4d4e
vulkan: fix mul_mat_vec failure in backend tests (#12529)
|
9 달 전 |
Georgi Gerganov
|
ba932dfb50
ggml : fix quantized cpy op (#12310)
|
10 달 전 |
Jeff Bolz
|
eddfb43850
vulkan: Optimize mul_mat_vec p021 and nc shaders (#12505)
|
10 달 전 |
Gaurav Garg
|
517b5ddbf0
CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (#12183)
|
10 달 전 |
Molly Sophia
|
7dfad387e3
llama: Add support for RWKV v7 architecture (#12412)
|
10 달 전 |
Jeff Bolz
|
bf69cfe62f
vulkan: fix bug in coopmat1 mul_mat_id (#12316)
|
10 달 전 |
cmdr2
|
0cbee131ad
cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)
|
10 달 전 |