Georgi Gerganov 231cff5f6f sync : ggml 1 year ago
..
template-instances 69c487f4ed CUDA: MMQ code deduplication + iquant support (#8495) 1 year ago
vendors 439b3fc75a cuda : organize vendor-specific headers into vendors directory (#8746) 1 year ago
acc.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
acc.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
arange.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
arange.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
argsort.cu 2b1f616b20 ggml : reduce hash table reset cost (#8698) 1 year ago
argsort.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
binbcast.cu 231cff5f6f sync : ggml 1 year ago
binbcast.cuh 231cff5f6f sync : ggml 1 year ago
clamp.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
clamp.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
common.cuh 439b3fc75a cuda : organize vendor-specific headers into vendors directory (#8746) 1 year ago
concat.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
concat.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
conv-transpose-1d.cu fde13b3bb9 feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854) 1 year ago
conv-transpose-1d.cuh fde13b3bb9 feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854) 1 year ago
convert.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
convert.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
cpy.cu 2b1f616b20 ggml : reduce hash table reset cost (#8698) 1 year ago
cpy.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
cross-entropy-loss.cu 231cff5f6f sync : ggml 1 year ago
cross-entropy-loss.cuh 231cff5f6f sync : ggml 1 year ago
dequantize.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
diagmask.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
diagmask.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
dmmv.cu 7a11eb3a26 cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800) 1 year ago
dmmv.cuh 7a11eb3a26 cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800) 1 year ago
fattn-common.cuh e11bd856d5 CPU/CUDA: Gemma 2 FlashAttention support (#8542) 1 year ago
fattn-tile-f16.cu e11bd856d5 CPU/CUDA: Gemma 2 FlashAttention support (#8542) 1 year ago
fattn-tile-f16.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
fattn-tile-f32.cu e11bd856d5 CPU/CUDA: Gemma 2 FlashAttention support (#8542) 1 year ago
fattn-tile-f32.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
fattn-vec-f16.cuh e11bd856d5 CPU/CUDA: Gemma 2 FlashAttention support (#8542) 1 year ago
fattn-vec-f32.cuh e11bd856d5 CPU/CUDA: Gemma 2 FlashAttention support (#8542) 1 year ago
fattn-wmma-f16.cuh e11bd856d5 CPU/CUDA: Gemma 2 FlashAttention support (#8542) 1 year ago
fattn.cu e11bd856d5 CPU/CUDA: Gemma 2 FlashAttention support (#8542) 1 year ago
fattn.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
getrows.cu 2b1f616b20 ggml : reduce hash table reset cost (#8698) 1 year ago
getrows.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
im2col.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
im2col.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
mma.cuh 808aba3916 CUDA: optimize and refactor MMQ (#8416) 1 year ago
mmq.cu 2b1f616b20 ggml : reduce hash table reset cost (#8698) 1 year ago
mmq.cuh 2b1f616b20 ggml : reduce hash table reset cost (#8698) 1 year ago
mmvq.cu 2b1f616b20 ggml : reduce hash table reset cost (#8698) 1 year ago
mmvq.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
norm.cu 2d5dd7bb3f ggml : add epsilon as a parameter for group_norm (#8818) 1 year ago
norm.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
pad.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
pad.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
pool2d.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
pool2d.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
quantize.cu 2b1f616b20 ggml : reduce hash table reset cost (#8698) 1 year ago
quantize.cuh 808aba3916 CUDA: optimize and refactor MMQ (#8416) 1 year ago
rope.cu 06943a69f6 ggml : move rope type enum to ggml.h (#8949) 1 year ago
rope.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
scale.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
scale.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
softmax.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
softmax.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
sumrows.cu 231cff5f6f sync : ggml 1 year ago
sumrows.cuh 231cff5f6f sync : ggml 1 year ago
tsembd.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
tsembd.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
unary.cu 231cff5f6f sync : ggml 1 year ago
unary.cuh 231cff5f6f sync : ggml 1 year ago
upscale.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
upscale.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 year ago
vecdotq.cuh 69c487f4ed CUDA: MMQ code deduplication + iquant support (#8495) 1 year ago