cmdr2
|
0cbee131ad
cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)
|
10 kuukautta sitten |
cmdr2
|
87abb7e903
cuda/cpu: Increase support for fp16 unary operations (ggml/1125)
|
10 kuukautta sitten |
cmdr2
|
f54a4ba11e
Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121)
|
10 kuukautta sitten |
Diego Devesa
|
d5c63cd7f9
test-backend-ops : add option -p to filter by op params (#12155)
|
10 kuukautta sitten |
William Tambellini
|
70680c48e5
ggml : upgrade init_tensor API to return a ggml_status (#11854)
|
10 kuukautta sitten |
Johannes Gäßler
|
5fa07c2f93
CUDA: optimize FA for GQA + large batches (#12014)
|
11 kuukautta sitten |
Rémy O
|
2eea03d86a
vulkan: implement several ops relevant for ggml_opt (#11769)
|
11 kuukautta sitten |
Johannes Gäßler
|
fd08255d0d
CUDA: non-contiguous (RMS) norm support (#11659)
|
11 kuukautta sitten |
Akarshan Biswas
|
6e84b0ab8e
SYCL : SOFTMAX F16 mask support and other fixes (#11261)
|
11 kuukautta sitten |
Johannes Gäßler
|
8137b4bb2b
CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380)
|
11 kuukautta sitten |
Jeff Bolz
|
564804b79b
tests: fix some mul_mat test gaps (#11375)
|
11 kuukautta sitten |
Jeff Bolz
|
44e18ef939
vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281)
|
1 vuosi sitten |
Jeff Bolz
|
bd38ddea01
vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166)
|
1 vuosi sitten |
Johannes Gäßler
|
9c8dcefe17
CUDA: backwards pass for misc. ops, add tests (#11257)
|
1 vuosi sitten |
Johannes Gäßler
|
432df2d5f9
RoPE: fix back, CUDA support for back + noncont. (#11240)
|
1 vuosi sitten |
Molly Sophia
|
ee7136c6d1
llama: add support for QRWKV6 model architecture (#11001)
|
1 vuosi sitten |
Jeff Bolz
|
716bd6dec3
vulkan: optimize mul_mat for small values of N (#10991)
|
1 vuosi sitten |
Jeff Bolz
|
a813badbbd
vulkan: im2col and matmul optimizations for stable diffusion (#10942)
|
1 vuosi sitten |
Georgi Gerganov
|
0006f5a74a
ggml : update ggml_backend_cpu_device_supports_op (#10867)
|
1 vuosi sitten |
HimariO
|
ba1cb19cdd
llama : add Qwen2VL support + multimodal RoPE (#10361)
|
1 vuosi sitten |
PAB
|
a8cbab201d
ggml: add `GGML_SET` Metal kernel + i32 CPU kernel (ggml/1037)
|
1 vuosi sitten |
PAB
|
c2082d93a8
ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034)
|
1 vuosi sitten |
Jeff Bolz
|
2759916d86
vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (#10642)
|
1 vuosi sitten |
PAB
|
efb6ae9630
feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel (ggml/1019)
|
1 vuosi sitten |
Georgi Gerganov
|
0115df2f65
metal : small-batch mat-mul kernels (#10581)
|
1 vuosi sitten |
Georgi Gerganov
|
f0678c5ff4
ggml : fix I8MM Q4_1 scaling factor conversion (#10562)
|
1 vuosi sitten |
Jeff Bolz
|
904109ed0d
vulkan: fix group_norm (#10496)
|
1 vuosi sitten |
Diego Devesa
|
5931c1f233
ggml : add support for dynamic loading of backends (#10469)
|
1 vuosi sitten |
Diego Devesa
|
a5e47592b6
cuda : optimize argmax (#10441)
|
1 vuosi sitten |
Johannes Gäßler
|
02e4eaf22f
ggml-opt: fix data corruption (ggml/1022)
|
1 vuosi sitten |