Benjamin Findley
|
e586ee4259
change default temperature of OAI compat API from 0 to 1 (#7226)
|
1 год назад |
Johannes Gäßler
|
c12452c7ae
JSON: [key] -> .at(key), assert() -> GGML_ASSERT (#7143)
|
1 год назад |
Xuan Son Nguyen
|
1fd9c1741d
clean up json_value & server_log (#7142)
|
1 год назад |
Pedro Cuenca
|
b97bc3966e
llama : support Llama 3 HF conversion (#6745)
|
1 год назад |
Pierrick Hymbert
|
75cd4c7729
ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495)
|
1 год назад |
JH23X
|
60cdf40cc3
server : handle exception on wrong type in request (#6452)
|
1 год назад |
Xuan Son Nguyen
|
ad3a0505e3
Server: clean up OAI params parsing function (#6284)
|
1 год назад |
Pierrick Hymbert
|
1b26aebe4d
server: flush stdout after logging in both text and json layout (#6253)
|
1 год назад |
Olivier Chafik
|
72114edf06
json-schema-to-grammar : fix order of props + non-str const/enum (#6232)
|
1 год назад |
Olivier Chafik
|
5b7b0ac8df
json-schema-to-grammar improvements (+ added to server) (#5978)
|
1 год назад |
Karthick
|
47cc7a7bf9
Server: Handle n_keep parameter in the request (#6174)
|
1 год назад |
Xuan Son Nguyen
|
99b71c068f
Server: Use multi-task for embeddings endpoint (#6001)
|
1 год назад |
Xuan Son Nguyen
|
caa106d4e0
Server: format error to json (#5961)
|
1 год назад |
Minsoo Cheong
|
332bdfd798
server : maintain chat completion id for streaming responses (#5988)
|
1 год назад |
Georgi Gerganov
|
2002bc96bf
server : refactor (#5882)
|
1 год назад |
Pierrick Hymbert
|
9731134296
server: tests: passkey challenge / self-extend with context shift demo (#5832)
|
1 год назад |
Xuan Son Nguyen
|
052051d8ae
Server: normalize naming (#5779)
|
1 год назад |
Pierrick Hymbert
|
930b178026
server: logs - unified format and --log-format option (#5700)
|
1 год назад |
Pierrick Hymbert
|
d52d7819b8
server: concurrency fix + monitoring - add /metrics prometheus compatible endpoint (#5708)
|
1 год назад |
Pierrick Hymbert
|
1ecea255eb
server: health: fix race condition on slots data using tasks queue (#5634)
|
1 год назад |
Xuan Son Nguyen
|
9c405c9f9a
Server: use llama_chat_apply_template (#5593)
|
1 год назад |
Daniel Hiltgen
|
66c1968f7a
server : graceful server shutdown (#5244)
|
1 год назад |
Xuan Son Nguyen
|
907e08c110
server : add llama2 chat template (#5425)
|
1 год назад |
Georgi Gerganov
|
753eafed0e
sync : ggml
|
2 лет назад |
Xuan Son Nguyen
|
48c857aa10
server : refactored the task processing logic (#5065)
|
2 лет назад |