Georgi Gerganov
|
c311ac664d
cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188)
|
7 ماه پیش |
Georgi Gerganov
|
b9912ac570
batch : auto-gen positions + verify multi-sequence input (#14177)
|
7 ماه پیش |
Georgi Gerganov
|
de2ef53a4b
kv-cache : rework kv_cell (#13706)
|
8 ماه پیش |
David Huang
|
7f323a589f
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)
|
8 ماه پیش |
fairydreaming
|
8fcb563613
Load all MoE experts during warmup (#11571)
|
10 ماه پیش |
Georgi Gerganov
|
f66f582927
llama : refactor `src/llama.cpp` (#10902)
|
1 سال پیش |