Johannes Gäßler
|
e789095502
llama: print memory breakdown on exit (#15860)
|
3 months ago |
Jeff Bolz
|
c0b45097c3
rename optimize_graph to graph_optimize (#16082)
|
4 months ago |
Jeff Bolz
|
e68aa10d8f
vulkan: sort graph to allow more parallel execution (#15850)
|
4 months ago |
Johannes Gäßler
|
5d804a4938
ggml-backend: raise GGML_MAX_SPLIT_INPUTS (#15722)
|
4 months ago |
Diego Devesa
|
9777032dcc
llama : separate compute buffer reserve from fattn check (#15696)
|
4 months ago |
Johannes Gäßler
|
e81b8e4b7f
llama: use FA + max. GPU layers by default (#15434)
|
4 months ago |
Diego Devesa
|
54a241f505
sched : fix possible use of wrong ids tensor when offloading moe prompt processing (#15488)
|
5 months ago |
Diego Devesa
|
5682a3745f
sched : copy only the used experts when offloading prompt processing (#15346)
|
5 months ago |
Diego Devesa
|
0d8831543c
ggml : fix fallback to CPU for ununsupported ops (#15118)
|
5 months ago |
Diego Devesa
|
c12bbde372
sched : fix multiple evaluations of the same graph with pipeline parallelism (#14855)
|
6 months ago |
Georgi Gerganov
|
bf9087f59a
metal : fuse add, mul + add tests (#14596)
|
6 months ago |
Jeff Bolz
|
bd9c981d72
vulkan: Add fusion support for RMS_NORM+MUL (#14366)
|
6 months ago |
Diego Devesa
|
b47ab7b8e9
sched : avoid changing cur_copy when a graph is already allocated (#13922)
|
7 months ago |
Diego Devesa
|
952f3953c1
ggml : allow CUDA graphs when using pipeline parallelism (#13814)
|
7 months ago |
Johannes Gäßler
|
10d2af0eaa
llama/ggml: add LLM training support (#10544)
|
8 months ago |
David Huang
|
7f323a589f
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)
|
8 months ago |
Johannes Gäßler
|
9070365020
CUDA: fix logic for clearing padding with -ngl 0 (#13320)
|
8 months ago |
mgroeber9110
|
5bbe6a9fe9
ggml : portability fixes for VS 2017 (#12150)
|
10 months ago |
William Tambellini
|
70680c48e5
ggml : upgrade init_tensor API to return a ggml_status (#11854)
|
10 months ago |
Diego Devesa
|
017cc5f446
ggml-backend : only offload from host buffers (fix) (#11124)
|
1 year ago |
Diego Devesa
|
a3d50bc022
ggml-backend : only offload from host buffers (#11120)
|
1 year ago |
Daniel Bevenius
|
db68c93b57
ggml : improve inputs log sched_print_assignments (ggml/1053)
|
1 year ago |
Diego Devesa
|
7cc2d2c889
ggml : move AMX to the CPU backend (#10570)
|
1 year ago |
slaren
|
59b9172822
ggml/sched : do not skip views in pre-assignments
|
1 year ago |
Johannes Gäßler
|
02e4eaf22f
ggml-opt: fix data corruption (ggml/1022)
|
1 year ago |
Diego Devesa
|
be5caccef9
llama : only use default buffer types for the KV cache (#10358)
|
1 year ago |
Diego Devesa
|
eda7e1d4f5
ggml : fix possible buffer use after free in sched reserve (#9930)
|
1 year ago |
Johannes Gäßler
|
8a43e940ab
ggml: new optimization interface (ggml/988)
|
1 year ago |
Diego Devesa
|
ae8de6d50a
ggml : build backends as libraries (#10256)
|
1 year ago |
Diego Devesa
|
9f40989351
ggml : move CPU backend to a separate file (#10144)
|
1 year ago |