1
0
HimariO ba1cb19cdd llama : add Qwen2VL support + multimodal RoPE (#10361) 1 жил өмнө
..
template-instances 69c487f4ed CUDA: MMQ code deduplication + iquant support (#8495) 1 жил өмнө
vendors 3ad5451f3b Add some minimal optimizations for CDNA (#10498) 1 жил өмнө
CMakeLists.txt ab96610b1e cmake : enable warnings in llama (#10474) 1 жил өмнө
acc.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
acc.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
arange.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
arange.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
argmax.cu a5e47592b6 cuda : optimize argmax (#10441) 1 жил өмнө
argmax.cuh fabdc3bda3 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) 1 жил өмнө
argsort.cu 2b1f616b20 ggml : reduce hash table reset cost (#8698) 1 жил өмнө
argsort.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
binbcast.cu 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) 1 жил өмнө
binbcast.cuh 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) 1 жил өмнө
clamp.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
clamp.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
common.cuh 750cb3e246 CUDA: rename macros to avoid conflicts with WinAPI (#10736) 1 жил өмнө
concat.cu 8faa1d4dd4 CUDA: faster non-contiguous concat (#10760) 1 жил өмнө
concat.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
conv-transpose-1d.cu fde13b3bb9 feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854) 1 жил өмнө
conv-transpose-1d.cuh fde13b3bb9 feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854) 1 жил өмнө
convert.cu 750cb3e246 CUDA: rename macros to avoid conflicts with WinAPI (#10736) 1 жил өмнө
convert.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
count-equal.cu 5b359bb1e3 ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213) 1 жил өмнө
count-equal.cuh fabdc3bda3 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) 1 жил өмнө
cpy.cu 116efee0ee cuda: add q8_0->f32 cpy operation (#9571) 1 жил өмнө
cpy.cuh 8c60a8a462 increase cuda_cpy block size (ggml/996) 1 жил өмнө
cross-entropy-loss.cu 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) 1 жил өмнө
cross-entropy-loss.cuh 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) 1 жил өмнө
dequantize.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
diagmask.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
diagmask.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
fattn-common.cuh ae8de6d50a ggml : build backends as libraries (#10256) 1 жил өмнө
fattn-tile-f16.cu ae8de6d50a ggml : build backends as libraries (#10256) 1 жил өмнө
fattn-tile-f16.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
fattn-tile-f32.cu ae8de6d50a ggml : build backends as libraries (#10256) 1 жил өмнө
fattn-tile-f32.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
fattn-vec-f16.cuh e9e661bd59 CUDA: remove unnecessary warp reduce in FA (ggml/1032) 1 жил өмнө
fattn-vec-f32.cuh e9e661bd59 CUDA: remove unnecessary warp reduce in FA (ggml/1032) 1 жил өмнө
fattn-wmma-f16.cuh ae8de6d50a ggml : build backends as libraries (#10256) 1 жил өмнө
fattn.cu 750cb3e246 CUDA: rename macros to avoid conflicts with WinAPI (#10736) 1 жил өмнө
fattn.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
getrows.cu 2b1f616b20 ggml : reduce hash table reset cost (#8698) 1 жил өмнө
getrows.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
ggml-cuda.cu 750cb3e246 CUDA: rename macros to avoid conflicts with WinAPI (#10736) 1 жил өмнө
im2col.cu 80273a306d CUDA: fix 1D im2col, add tests (ggml/993) 1 жил өмнө
im2col.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
mma.cuh 750cb3e246 CUDA: rename macros to avoid conflicts with WinAPI (#10736) 1 жил өмнө
mmq.cu 750cb3e246 CUDA: rename macros to avoid conflicts with WinAPI (#10736) 1 жил өмнө
mmq.cuh 750cb3e246 CUDA: rename macros to avoid conflicts with WinAPI (#10736) 1 жил өмнө
mmv.cu 26a8406ba9 CUDA: fix shared memory access condition for mmv (#10740) 1 жил өмнө
mmv.cuh c3ea58aca4 CUDA: remove DMMV, consolidate F16 mult mat vec (#10318) 1 жил өмнө
mmvq.cu 750cb3e246 CUDA: rename macros to avoid conflicts with WinAPI (#10736) 1 жил өмнө
mmvq.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
norm.cu 2d5dd7bb3f ggml : add epsilon as a parameter for group_norm (#8818) 1 жил өмнө
norm.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
opt-step-adamw.cu 8a43e940ab ggml: new optimization interface (ggml/988) 1 жил өмнө
opt-step-adamw.cuh 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) 1 жил өмнө
out-prod.cu d13edb17ed ggml : fix builds (#0) 1 жил өмнө
out-prod.cuh 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) 1 жил өмнө
pad.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
pad.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
pool2d.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
pool2d.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
quantize.cu a5e47592b6 cuda : optimize argmax (#10441) 1 жил өмнө
quantize.cuh 808aba3916 CUDA: optimize and refactor MMQ (#8416) 1 жил өмнө
rope.cu ba1cb19cdd llama : add Qwen2VL support + multimodal RoPE (#10361) 1 жил өмнө
rope.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
scale.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
scale.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
softmax.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
softmax.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
sum.cu 750cb3e246 CUDA: rename macros to avoid conflicts with WinAPI (#10736) 1 жил өмнө
sum.cuh 202084d31d tests: add gradient tests for all backends (ggml/932) 1 жил өмнө
sumrows.cu 231cff5f6f sync : ggml 1 жил өмнө
sumrows.cuh 231cff5f6f sync : ggml 1 жил өмнө
tsembd.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
tsembd.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
unary.cu 2a63caaa69 RWKV v6: RWKV_WKV op CUDA implementation (#9454) 1 жил өмнө
unary.cuh 2a63caaa69 RWKV v6: RWKV_WKV op CUDA implementation (#9454) 1 жил өмнө
upscale.cu f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
upscale.cuh f3f65429c4 llama : reorganize source code + improve CMake (#8006) 1 жил өмнө
vecdotq.cuh 69c487f4ed CUDA: MMQ code deduplication + iquant support (#8495) 1 жил өмнө
wkv6.cu 3bcd40b3c5 Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) 1 жил өмнө
wkv6.cuh 3bcd40b3c5 Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) 1 жил өмнө