Johannes Gäßler fabdc3bda3 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) hai 1 ano
..
template-instances 69c487f4ed CUDA: MMQ code deduplication + iquant support (#8495) hai 1 ano
vendors c35e586ea5 musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (#9526) hai 1 ano
acc.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
acc.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
arange.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
arange.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
argmax.cu fabdc3bda3 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) hai 1 ano
argmax.cuh fabdc3bda3 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) hai 1 ano
argsort.cu 2b1f616b20 ggml : reduce hash table reset cost (#8698) hai 1 ano
argsort.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
binbcast.cu 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) hai 1 ano
binbcast.cuh 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) hai 1 ano
clamp.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
clamp.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
common.cuh fabdc3bda3 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) hai 1 ano
concat.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
concat.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
conv-transpose-1d.cu fde13b3bb9 feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854) hai 1 ano
conv-transpose-1d.cuh fde13b3bb9 feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854) hai 1 ano
convert.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
convert.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
count-equal.cu fabdc3bda3 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) hai 1 ano
count-equal.cuh fabdc3bda3 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) hai 1 ano
cpy.cu 116efee0ee cuda: add q8_0->f32 cpy operation (#9571) hai 1 ano
cpy.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
cross-entropy-loss.cu 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) hai 1 ano
cross-entropy-loss.cuh 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) hai 1 ano
dequantize.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
diagmask.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
diagmask.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
dmmv.cu 7a11eb3a26 cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800) hai 1 ano
dmmv.cuh 7a11eb3a26 cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800) hai 1 ano
fattn-common.cuh e11bd856d5 CPU/CUDA: Gemma 2 FlashAttention support (#8542) hai 1 ano
fattn-tile-f16.cu fabdc3bda3 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) hai 1 ano
fattn-tile-f16.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
fattn-tile-f32.cu c35e586ea5 musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (#9526) hai 1 ano
fattn-tile-f32.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
fattn-vec-f16.cuh fabdc3bda3 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) hai 1 ano
fattn-vec-f32.cuh e11bd856d5 CPU/CUDA: Gemma 2 FlashAttention support (#8542) hai 1 ano
fattn-wmma-f16.cuh e11bd856d5 CPU/CUDA: Gemma 2 FlashAttention support (#8542) hai 1 ano
fattn.cu a5b57b08ce CUDA: enable Gemma FA for HIP/Pascal (#9581) hai 1 ano
fattn.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
getrows.cu 2b1f616b20 ggml : reduce hash table reset cost (#8698) hai 1 ano
getrows.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
im2col.cu aaa4099925 CUDA: remove bad assert (ggml/972) hai 1 ano
im2col.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
mma.cuh 808aba3916 CUDA: optimize and refactor MMQ (#8416) hai 1 ano
mmq.cu 5af118efda CUDA: fix --split-mode row race condition (#9413) hai 1 ano
mmq.cuh 5af118efda CUDA: fix --split-mode row race condition (#9413) hai 1 ano
mmvq.cu 2b1f616b20 ggml : reduce hash table reset cost (#8698) hai 1 ano
mmvq.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
norm.cu 2d5dd7bb3f ggml : add epsilon as a parameter for group_norm (#8818) hai 1 ano
norm.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
opt-step-adamw.cu 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) hai 1 ano
opt-step-adamw.cuh 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) hai 1 ano
out-prod.cu d13edb17ed ggml : fix builds (#0) hai 1 ano
out-prod.cuh 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) hai 1 ano
pad.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
pad.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
pool2d.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
pool2d.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
quantize.cu 2b1f616b20 ggml : reduce hash table reset cost (#8698) hai 1 ano
quantize.cuh 808aba3916 CUDA: optimize and refactor MMQ (#8416) hai 1 ano
rope.cu 06943a69f6 ggml : move rope type enum to ggml.h (#8949) hai 1 ano
rope.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
rwkv-wkv.cu 2a63caaa69 RWKV v6: RWKV_WKV op CUDA implementation (#9454) hai 1 ano
rwkv-wkv.cuh 2a63caaa69 RWKV v6: RWKV_WKV op CUDA implementation (#9454) hai 1 ano
scale.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
scale.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
softmax.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
softmax.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
sum.cu 5cb12f6839 CUDA: fix sum.cu compilation for CUDA < 11.7 (#9562) hai 1 ano
sum.cuh 202084d31d tests: add gradient tests for all backends (ggml/932) hai 1 ano
sumrows.cu 231cff5f6f sync : ggml hai 1 ano
sumrows.cuh 231cff5f6f sync : ggml hai 1 ano
tsembd.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
tsembd.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
unary.cu 2a63caaa69 RWKV v6: RWKV_WKV op CUDA implementation (#9454) hai 1 ano
unary.cuh 2a63caaa69 RWKV v6: RWKV_WKV op CUDA implementation (#9454) hai 1 ano
upscale.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
upscale.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) hai 1 ano
vecdotq.cuh 69c487f4ed CUDA: MMQ code deduplication + iquant support (#8495) hai 1 ano