Pierrick Hymbert
|
c145f8a132
server : slots monitoring endpoint (#5550)
|
1 год назад |
Pierrick Hymbert
|
e75c6279d1
server : enhanced health endpoint (#5548)
|
1 год назад |
Pierrick Hymbert
|
36376abe05
server : --n-predict option document and cap to max value (#5549)
|
1 год назад |
Daniel Hiltgen
|
66c1968f7a
server : graceful server shutdown (#5244)
|
1 год назад |
Alexey Parfenov
|
6dcc02d244
server : add "samplers" param to control the samplers order (#5494)
|
1 год назад |
Rőczey Barnabás
|
5f5808ca7b
server : fix system prompt cli (#5516)
|
1 год назад |
bmwl
|
f486f6e1e5
ggml : add numa options (#5377)
|
1 год назад |
Elbios
|
0d4177126b
llava : fix memory management bug (#5491)
|
1 год назад |
John
|
aa23412989
llava : support v1.6 (#5267)
|
1 год назад |
Alexey Parfenov
|
684780141a
server : allow to specify tokens as strings in logit_bias (#5003)
|
1 год назад |
Xuan Son Nguyen
|
907e08c110
server : add llama2 chat template (#5425)
|
1 год назад |
Riley Stewart
|
7c777fcd5d
server : fix prompt caching for repeated prompts (#5420)
|
1 год назад |
Justin Parker
|
f3e2b4fa3f
server : update `/props` with "total_slots" value (#5373)
|
1 год назад |
Alexey Parfenov
|
213d1439fa
server : remove model.json endpoint (#5371)
|
1 год назад |
Justin Parker
|
8a79c591de
server : include total "num_slots" in props endpoint (#5349)
|
1 год назад |
Michael Coppola
|
31e7903221
server : add `dynatemp_range` and `dynatemp_exponent` (#5352)
|
1 год назад |
Niall Coates
|
4ffc7a17d4
server : various fixes for the prompt field in /completion (#5300)
|
1 год назад |
Alexey Parfenov
|
a2d60c9158
server : allow to get default generation settings for completion (#5307)
|
1 год назад |
Michael Klimenko
|
52bb63c708
refactor : switch to emplace_back to avoid extra object (#5291)
|
1 год назад |
Georgi Gerganov
|
5cb04dbc16
llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)
|
1 год назад |
Georgi Gerganov
|
e6f291d158
server : fix context shift (#5195)
|
1 год назад |
Wu Jian Ping
|
c82d18e863
server : embeddings compatibility for OpenAI (#5190)
|
1 год назад |
Abhilash Majumder
|
0f648573dd
ggml : add unified SYCL backend for Intel GPUs (#2690)
|
2 лет назад |
Michael Klimenko
|
35a2ee9143
Remove unused data and add fixes (#5154)
|
2 лет назад |
Maximilian Winter
|
ec903c0341
server : add self-extend support (#5104)
|
2 лет назад |
Xuan Son Nguyen
|
48c857aa10
server : refactored the task processing logic (#5065)
|
2 лет назад |
Xuan Son Nguyen
|
821f0a271e
server : defer tasks when "slot unavailable" (#5018)
|
2 лет назад |
Georgi Gerganov
|
0ea069b87b
server : fix prompt caching with system prompt (#4914)
|
2 лет назад |
Ziad Ben Hadj-Alouane
|
356327feb3
server : fix deadlock that occurs in multi-prompt scenarios (#4905)
|
2 лет назад |
makomk
|
ee8243adaa
server : fix crash with multimodal models without BOS token (#4904)
|
2 лет назад |