提交歷史

作者 SHA1 備註 提交日期
  Georgi Gerganov e58174cecb llama : bump max seq limit from 64 to 256 (#15916) 4 月之前
  Georgi Gerganov 9ebebef62f llama : remove KV cache defragmentation logic (#15473) 5 月之前
  Georgi Gerganov 225e7a1438 llama : add high-throughput mode (#14363) 6 月之前
  Georgi Gerganov c311ac664d cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) 7 月之前
  Georgi Gerganov b9912ac570 batch : auto-gen positions + verify multi-sequence input (#14177) 7 月之前
  Georgi Gerganov de2ef53a4b kv-cache : rework kv_cell (#13706) 8 月之前
  David Huang 7f323a589f Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386) 8 月之前
  fairydreaming 8fcb563613 Load all MoE experts during warmup (#11571) 10 月之前
  Georgi Gerganov f66f582927 llama : refactor `src/llama.cpp` (#10902) 1 年之前