cturan/llama.cpp

Autore	SHA1 Messaggio	Data
Georgi Gerganov	39173bcacb context : reserve new scheduler when graph topology changes (#18547)	2 settimane fa
Georgi Gerganov	cd5e3b5754 server : support unified cache across slots (#16736)	3 mesi fa
Georgi Gerganov	e58174cecb llama : bump max seq limit from 64 to 256 (#15916)	4 mesi fa
Georgi Gerganov	9ebebef62f llama : remove KV cache defragmentation logic (#15473)	5 mesi fa
Georgi Gerganov	225e7a1438 llama : add high-throughput mode (#14363)	6 mesi fa
Georgi Gerganov	c311ac664d cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188)	7 mesi fa
Georgi Gerganov	b9912ac570 batch : auto-gen positions + verify multi-sequence input (#14177)	7 mesi fa
Georgi Gerganov	de2ef53a4b kv-cache : rework kv_cell (#13706)	8 mesi fa
David Huang	7f323a589f Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)	8 mesi fa
fairydreaming	8fcb563613 Load all MoE experts during warmup (#11571)	10 mesi fa
Georgi Gerganov	f66f582927 llama : refactor `src/llama.cpp` (#10902)	1 anno fa