Georgi Gerganov
|
225e7a1438
llama : add high-throughput mode (#14363)
|
преди 6 месеца |
Georgi Gerganov
|
c311ac664d
cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188)
|
преди 7 месеца |
Georgi Gerganov
|
b9912ac570
batch : auto-gen positions + verify multi-sequence input (#14177)
|
преди 7 месеца |
Georgi Gerganov
|
de2ef53a4b
kv-cache : rework kv_cell (#13706)
|
преди 8 месеца |
David Huang
|
7f323a589f
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)
|
преди 8 месеца |
fairydreaming
|
8fcb563613
Load all MoE experts during warmup (#11571)
|
преди 10 месеца |
Georgi Gerganov
|
f66f582927
llama : refactor `src/llama.cpp` (#10902)
|
преди 1 година |