Johannes Gäßler
|
28103f4832
Server: fix seed for multiple slots (#6835)
|
1 год назад |
Jan Boon
|
beea6e1b16
llama : save and restore kv cache for single seq id (#6341)
|
1 год назад |
Pierrick Hymbert
|
a016026a3a
server: continuous performance monitoring and PR comment (#6283)
|
1 год назад |
Pierrick Hymbert
|
f482bb2e49
common: llama_load_model_from_url split support (#6192)
|
1 год назад |
Olivier Chafik
|
5b7b0ac8df
json-schema-to-grammar improvements (+ added to server) (#5978)
|
1 год назад |
Georgi Gerganov
|
bc0baab2ea
server : allow to override -ngl in tests (#6170)
|
1 год назад |
Jared Van Bortel
|
bd60d82d0c
server tests : more pythonic process management; fix bare `except:` (#6146)
|
1 год назад |
Pierrick Hymbert
|
d01b3c4c32
common: llama_load_model_from_url using --model-url (#6098)
|
1 год назад |
Pierrick Hymbert
|
43241adf22
server: disable debug release type sanitizer, simplify trigger (#6047)
|
1 год назад |
slaren
|
f30ea47a87
llama : add pipeline parallelism support (#6017)
|
1 год назад |
Xuan Son Nguyen
|
caa106d4e0
Server: format error to json (#5961)
|
1 год назад |
Pierrick Hymbert
|
fa8a809a91
server: ci: windows build and tests (#5968)
|
1 год назад |
Xuan Son Nguyen
|
950ba1ab84
Server: reorganize some http logic (#5939)
|
1 год назад |
Pierrick Hymbert
|
fd72d2d2a5
server: tests: add truncated prompt tests, better kv cache size (#5933)
|
1 год назад |
Pierrick Hymbert
|
76e868821a
server: metrics: add llamacpp:prompt_seconds_total and llamacpp:tokens_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (#5937)
|
1 год назад |
Georgi Gerganov
|
2002bc96bf
server : refactor (#5882)
|
1 год назад |
Pierrick Hymbert
|
9731134296
server: tests: passkey challenge / self-extend with context shift demo (#5832)
|
1 год назад |
Jorge A
|
efc72253f7
server : add "/chat/completions" alias for "/v1/...` (#5722)
|
1 год назад |
Pierrick Hymbert
|
e3965cf35a
server: tests - slow inference causes timeout on the CI (#5715)
|
1 год назад |
Pierrick Hymbert
|
930b178026
server: logs - unified format and --log-format option (#5700)
|
1 год назад |
Pierrick Hymbert
|
d52d7819b8
server: concurrency fix + monitoring - add /metrics prometheus compatible endpoint (#5708)
|
1 год назад |
Pierrick Hymbert
|
9e359a4f47
server: continue to update other slots on embedding concurrent request (#5699)
|
1 год назад |
Pierrick Hymbert
|
525213d2f5
server: init functional tests (#5566)
|
1 год назад |