Georgi Gerganov
|
bf9087f59a
metal : fuse add, mul + add tests (#14596)
|
6 months ago |
Jeff Bolz
|
bd9c981d72
vulkan: Add fusion support for RMS_NORM+MUL (#14366)
|
6 months ago |
Diego Devesa
|
b47ab7b8e9
sched : avoid changing cur_copy when a graph is already allocated (#13922)
|
7 months ago |
Diego Devesa
|
952f3953c1
ggml : allow CUDA graphs when using pipeline parallelism (#13814)
|
7 months ago |
Johannes Gäßler
|
10d2af0eaa
llama/ggml: add LLM training support (#10544)
|
8 months ago |
David Huang
|
7f323a589f
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)
|
8 months ago |
Johannes Gäßler
|
9070365020
CUDA: fix logic for clearing padding with -ngl 0 (#13320)
|
8 months ago |
mgroeber9110
|
5bbe6a9fe9
ggml : portability fixes for VS 2017 (#12150)
|
10 months ago |
William Tambellini
|
70680c48e5
ggml : upgrade init_tensor API to return a ggml_status (#11854)
|
10 months ago |
Diego Devesa
|
017cc5f446
ggml-backend : only offload from host buffers (fix) (#11124)
|
1 year ago |
Diego Devesa
|
a3d50bc022
ggml-backend : only offload from host buffers (#11120)
|
1 year ago |
Daniel Bevenius
|
db68c93b57
ggml : improve inputs log sched_print_assignments (ggml/1053)
|
1 year ago |
Diego Devesa
|
7cc2d2c889
ggml : move AMX to the CPU backend (#10570)
|
1 year ago |
slaren
|
59b9172822
ggml/sched : do not skip views in pre-assignments
|
1 year ago |
Johannes Gäßler
|
02e4eaf22f
ggml-opt: fix data corruption (ggml/1022)
|
1 year ago |
Diego Devesa
|
be5caccef9
llama : only use default buffer types for the KV cache (#10358)
|
1 year ago |
Diego Devesa
|
eda7e1d4f5
ggml : fix possible buffer use after free in sched reserve (#9930)
|
1 year ago |
Johannes Gäßler
|
8a43e940ab
ggml: new optimization interface (ggml/988)
|
1 year ago |
Diego Devesa
|
ae8de6d50a
ggml : build backends as libraries (#10256)
|
1 year ago |
Diego Devesa
|
9f40989351
ggml : move CPU backend to a separate file (#10144)
|
1 year ago |
Diego Devesa
|
c02e5ab2a6
llama : fix buffer checks for mamba and rwk (#10111)
|
1 year ago |
Sergio López
|
61408e7fad
kompute: add backend registry / device interfaces (#10045)
|
1 year ago |
Diego Devesa
|
c5b0f4b5d9
llama : refactor model loader with backend registry (#10026)
|
1 year ago |
leo-pony
|
6b8447352d
[CANN] Adapt to dynamically loadable backends mechanism (#9970)
|
1 year ago |
Ouadie EL FAROUKI
|
87421a23e8
[SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705)
|
1 year ago |
Ma Mingfei
|
60ce97c9d8
add amx kernel for gemm (#8998)
|
1 year ago |
Diego Devesa
|
f010b77a37
vulkan : add backend registry / device interfaces (#9721)
|
1 year ago |
Gilad S.
|
2194200278
fix: allocating CPU buffer with size `0` (#9917)
|
1 year ago |
Gilad S.
|
73afe681aa
fix: use `vm_allocate` to allocate CPU backend buffer on macOS (#9875)
|
1 year ago |
Diego Devesa
|
96776405a1
ggml : move more prints to the ggml log system (#9839)
|
1 year ago |