Georgi Gerganov
|
85a7d8677b
memory : remove KV cache size padding (#16812)
|
il y a 2 mois |
Georgi Gerganov
|
d00cbea63c
server : host-memory prompt caching (#16391)
|
il y a 3 mois |
Johannes Gäßler
|
e81b8e4b7f
llama: use FA + max. GPU layers by default (#15434)
|
il y a 4 mois |
Georgi Gerganov
|
d2fcd91cf9
server : disable context shift by default (#15416)
|
il y a 5 mois |
Xuan-Son Nguyen
|
6aa892ec2a
server : do not return error out of context (with ctx shift disabled) (#13577)
|
il y a 8 mois |
Diego Devesa
|
1d36b3670b
llama : move end-user examples to tools directory (#13249)
|
il y a 8 mois |