Georgi Gerganov
|
16bcc1259d
kv-cache : pad the cache size to 256 for performance (#17046)
|
2 месяцев назад |
ddh0
|
f6dcda3900
server : context checkpointing for hybrid and recurrent models (#16382)
|
3 месяцев назад |
Johannes Gäßler
|
e789095502
llama: print memory breakdown on exit (#15860)
|
4 месяцев назад |
Georgi Gerganov
|
c610b6c11b
kv-cache : fix SWA checks + disable cacheless iSWA (#15811)
|
4 месяцев назад |
Daniel Bevenius
|
fb15d649ed
llama : add support for EmbeddingGemma 300m (#15798)
|
4 месяцев назад |
Georgi Gerganov
|
b730706a49
kv-cache : support layer reuse (#15504)
|
5 месяцев назад |
Georgi Gerganov
|
715a6db02c
kv-cache : drop the "unified" prefix (#15467)
|
5 месяцев назад |