cturan/llama.cpp

Author	SHA1 Message	Date
Daniel Bevenius	7f3a72a8ed ggml : remove redundant n_copies check when setting input/output (#17612)	1 month ago
Georgi Gerganov	90c72a614a ggml : extend the GGML_SCHED_NO_REALLOC debug logic of the scheduler (#17617)	1 month ago
Diego Devesa	e072b2052e ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276)	1 month ago
Diego Devesa	dd091e52f8 sched : fix reserve ignoring user tensor assignments (#17232)	2 months ago
Johannes Gäßler	e789095502 llama: print memory breakdown on exit (#15860)	3 months ago
Jeff Bolz	c0b45097c3 rename optimize_graph to graph_optimize (#16082)	4 months ago
Jeff Bolz	e68aa10d8f vulkan: sort graph to allow more parallel execution (#15850)	4 months ago
Johannes Gäßler	5d804a4938 ggml-backend: raise GGML_MAX_SPLIT_INPUTS (#15722)	4 months ago
Diego Devesa	9777032dcc llama : separate compute buffer reserve from fattn check (#15696)	4 months ago
Johannes Gäßler	e81b8e4b7f llama: use FA + max. GPU layers by default (#15434)	4 months ago
Diego Devesa	54a241f505 sched : fix possible use of wrong ids tensor when offloading moe prompt processing (#15488)	5 months ago
Diego Devesa	5682a3745f sched : copy only the used experts when offloading prompt processing (#15346)	5 months ago
Diego Devesa	0d8831543c ggml : fix fallback to CPU for ununsupported ops (#15118)	5 months ago
Diego Devesa	c12bbde372 sched : fix multiple evaluations of the same graph with pipeline parallelism (#14855)	6 months ago
Georgi Gerganov	bf9087f59a metal : fuse add, mul + add tests (#14596)	6 months ago
Jeff Bolz	bd9c981d72 vulkan: Add fusion support for RMS_NORM+MUL (#14366)	6 months ago
Diego Devesa	b47ab7b8e9 sched : avoid changing cur_copy when a graph is already allocated (#13922)	7 months ago
Diego Devesa	952f3953c1 ggml : allow CUDA graphs when using pipeline parallelism (#13814)	7 months ago
Johannes Gäßler	10d2af0eaa llama/ggml: add LLM training support (#10544)	8 months ago
David Huang	7f323a589f Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)	8 months ago
Johannes Gäßler	9070365020 CUDA: fix logic for clearing padding with -ngl 0 (#13320)	8 months ago
mgroeber9110	5bbe6a9fe9 ggml : portability fixes for VS 2017 (#12150)	10 months ago
William Tambellini	70680c48e5 ggml : upgrade init_tensor API to return a ggml_status (#11854)	10 months ago
Diego Devesa	017cc5f446 ggml-backend : only offload from host buffers (fix) (#11124)	1 year ago
Diego Devesa	a3d50bc022 ggml-backend : only offload from host buffers (#11120)	1 year ago
Daniel Bevenius	db68c93b57 ggml : improve inputs log sched_print_assignments (ggml/1053)	1 year ago
Diego Devesa	7cc2d2c889 ggml : move AMX to the CPU backend (#10570)	1 year ago
slaren	59b9172822 ggml/sched : do not skip views in pre-assignments	1 year ago
Johannes Gäßler	02e4eaf22f ggml-opt: fix data corruption (ggml/1022)	1 year ago
Diego Devesa	be5caccef9 llama : only use default buffer types for the KV cache (#10358)	1 year ago

Newer Older

Commit History Find

Commit History