| .. |
|
ggml-amx
|
60ce97c9d8
add amx kernel for gemm (#8998)
|
1 year ago |
|
ggml-cann
|
904837e0cb
cann: fix crash when llama-bench is running on multiple cann devices (#9627)
|
1 year ago |
|
ggml-cuda
|
8c60a8a462
increase cuda_cpy block size (ggml/996)
|
1 year ago |
|
ggml-sycl
|
1db8c84fc6
fix mul_mat_vec_q and *_vec_q error (#9939)
|
1 year ago |
|
kompute @ 4565194ed7
|
f3f65429c4
llama : reorganize source code + improve CMake (#8006)
|
1 year ago |
|
kompute-shaders
|
1329c0a75e
kompute: add mul_mat_q4_k shader (#10097)
|
1 year ago |
|
llamafile
|
2f8bd2b901
llamafile : extend sgemm.cpp support for Q5_0 models (#10010)
|
1 year ago |
|
vulkan-shaders
|
8f275a7c45
ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763)
|
1 year ago |
|
CMakeLists.txt
|
e597e50794
build: fix build error in Windows env with OneAPI setup (#10107)
|
1 year ago |
|
ggml-aarch64.c
|
fc83a9e584
ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029)
|
1 year ago |
|
ggml-aarch64.h
|
370b1f7e7a
ggml : minor naming changes (#8433)
|
1 year ago |
|
ggml-alloc.c
|
cd60b88bf7
ggml-alloc : remove buffer_id from leaf_alloc (ggml/987)
|
1 year ago |
|
ggml-amx.cpp
|
c5b0f4b5d9
llama : refactor model loader with backend registry (#10026)
|
1 year ago |
|
ggml-backend-impl.h
|
c5b0f4b5d9
llama : refactor model loader with backend registry (#10026)
|
1 year ago |
|
ggml-backend.cpp
|
c02e5ab2a6
llama : fix buffer checks for mamba and rwk (#10111)
|
1 year ago |
|
ggml-blas.cpp
|
c5b0f4b5d9
llama : refactor model loader with backend registry (#10026)
|
1 year ago |
|
ggml-cann.cpp
|
c5b0f4b5d9
llama : refactor model loader with backend registry (#10026)
|
1 year ago |
|
ggml-common.h
|
9bc6db28d0
ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151)
|
1 year ago |
|
ggml-cpu-impl.h
|
23e0d70bac
ggml : move common CPU backend impl to new header (#9509)
|
1 year ago |
|
ggml-cuda.cu
|
c02e5ab2a6
llama : fix buffer checks for mamba and rwk (#10111)
|
1 year ago |
|
ggml-impl.h
|
73afe681aa
fix: use `vm_allocate` to allocate CPU backend buffer on macOS (#9875)
|
1 year ago |
|
ggml-kompute.cpp
|
1329c0a75e
kompute: add mul_mat_q4_k shader (#10097)
|
1 year ago |
|
ggml-metal.m
|
c5b0f4b5d9
llama : refactor model loader with backend registry (#10026)
|
1 year ago |
|
ggml-metal.metal
|
668750357e
metal : support permuted matrix multiplicaions (#10033)
|
1 year ago |
|
ggml-quants.c
|
6a0f779484
ggml : add run-time detection of neon, i8mm and sve (#9331)
|
1 year ago |
|
ggml-quants.h
|
6a0f779484
ggml : add run-time detection of neon, i8mm and sve (#9331)
|
1 year ago |
|
ggml-rpc.cpp
|
c5b0f4b5d9
llama : refactor model loader with backend registry (#10026)
|
1 year ago |
|
ggml-sycl.cpp
|
c5b0f4b5d9
llama : refactor model loader with backend registry (#10026)
|
1 year ago |
|
ggml-vulkan.cpp
|
c5b0f4b5d9
llama : refactor model loader with backend registry (#10026)
|
1 year ago |
|
ggml.c
|
1804adb0cf
ggml : remove ggml_scratch (#10121)
|
1 year ago |