Pierrick Hymbert
|
930b178026
server: logs - unified format and --log-format option (#5700)
|
1 gadu atpakaļ |
Pierrick Hymbert
|
d52d7819b8
server: concurrency fix + monitoring - add /metrics prometheus compatible endpoint (#5708)
|
1 gadu atpakaļ |
Georgi Gerganov
|
ab336a9d5e
code : normalize enum names (#5697)
|
1 gadu atpakaļ |
Pierrick Hymbert
|
9e359a4f47
server: continue to update other slots on embedding concurrent request (#5699)
|
1 gadu atpakaļ |
Pierrick Hymbert
|
525213d2f5
server: init functional tests (#5566)
|
1 gadu atpakaļ |
AlpinDale
|
fd43d66f46
server : add KV cache quantization options (#5684)
|
1 gadu atpakaļ |
Xuan Son Nguyen
|
a46f50747b
server : fallback to chatml, add AlphaMonarch chat template (#5628)
|
1 gadu atpakaļ |
Jared Van Bortel
|
89febfed93
examples : do not assume BOS when shifting context (#5622)
|
1 gadu atpakaļ |
Pierrick Hymbert
|
1ecea255eb
server: health: fix race condition on slots data using tasks queue (#5634)
|
1 gadu atpakaļ |
CJ Pais
|
6560bed3f0
server : support llava 1.6 (#5553)
|
1 gadu atpakaļ |
Xuan Son Nguyen
|
9c405c9f9a
Server: use llama_chat_apply_template (#5593)
|
1 gadu atpakaļ |
Pierrick Hymbert
|
c0a8c6db37
server : health endpoint configurable failure on no slot (#5594)
|
1 gadu atpakaļ |
Robey Holderith
|
5ee99c32f5
common, server : surface min_keep as its own parameter (#5567)
|
1 gadu atpakaļ |
Pierrick Hymbert
|
c145f8a132
server : slots monitoring endpoint (#5550)
|
1 gadu atpakaļ |
Pierrick Hymbert
|
e75c6279d1
server : enhanced health endpoint (#5548)
|
1 gadu atpakaļ |
Pierrick Hymbert
|
36376abe05
server : --n-predict option document and cap to max value (#5549)
|
1 gadu atpakaļ |
Daniel Hiltgen
|
66c1968f7a
server : graceful server shutdown (#5244)
|
1 gadu atpakaļ |
Alexey Parfenov
|
6dcc02d244
server : add "samplers" param to control the samplers order (#5494)
|
1 gadu atpakaļ |
Rőczey Barnabás
|
5f5808ca7b
server : fix system prompt cli (#5516)
|
1 gadu atpakaļ |
bmwl
|
f486f6e1e5
ggml : add numa options (#5377)
|
1 gadu atpakaļ |
Elbios
|
0d4177126b
llava : fix memory management bug (#5491)
|
1 gadu atpakaļ |
John
|
aa23412989
llava : support v1.6 (#5267)
|
1 gadu atpakaļ |
Alexey Parfenov
|
684780141a
server : allow to specify tokens as strings in logit_bias (#5003)
|
1 gadu atpakaļ |
Xuan Son Nguyen
|
907e08c110
server : add llama2 chat template (#5425)
|
1 gadu atpakaļ |
Riley Stewart
|
7c777fcd5d
server : fix prompt caching for repeated prompts (#5420)
|
1 gadu atpakaļ |
Justin Parker
|
f3e2b4fa3f
server : update `/props` with "total_slots" value (#5373)
|
1 gadu atpakaļ |
Alexey Parfenov
|
213d1439fa
server : remove model.json endpoint (#5371)
|
1 gadu atpakaļ |
Justin Parker
|
8a79c591de
server : include total "num_slots" in props endpoint (#5349)
|
1 gadu atpakaļ |
Michael Coppola
|
31e7903221
server : add `dynatemp_range` and `dynatemp_exponent` (#5352)
|
1 gadu atpakaļ |
Niall Coates
|
4ffc7a17d4
server : various fixes for the prompt field in /completion (#5300)
|
1 gadu atpakaļ |