Xuan-Son Nguyen
|
c42712b056
server: support multiple generations from one prompt (OAI "n" option) (#17775)
|
1 bulan lalu |
Chad Voegele
|
c4357dcc35
Server: Change Invalid Schema from Server Error (500) to User Error (400) (#17572)
|
1 bulan lalu |
Xuan-Son Nguyen
|
5d6bd842ea
server: remove default "gpt-3.5-turbo" model name (#17668)
|
1 bulan lalu |
Georgi Gerganov
|
cd5e3b5754
server : support unified cache across slots (#16736)
|
2 bulan lalu |
Radoslav Gerganov
|
68ee98ae18
server : return HTTP 400 if prompt exceeds context length (#16486)
|
3 bulan lalu |
Georgi Gerganov
|
d00cbea63c
server : host-memory prompt caching (#16391)
|
3 bulan lalu |
Radoslav Gerganov
|
2b6b55a59f
server : include usage statistics only when user request them (#16052)
|
4 bulan lalu |
Xuan-Son Nguyen
|
61bdfd5298
server : implement prompt processing progress report in stream mode (#15827)
|
4 bulan lalu |
Xuan-Son Nguyen
|
a68d914426
server: add exceed_context_size_error type (#15780)
|
4 bulan lalu |
teo
|
1bc664a26a
server: fix OpenAI API compatibility for usage statistics in chat streams (#15444)
|
4 bulan lalu |
Lukas Straub
|
a9f77a8be3
server : add openai-style logit_bias support (#14946)
|
5 bulan lalu |
Sigbjørn Skjæret
|
ddef99522d
server : fix assistant prefilling when content is an array (#14360)
|
6 bulan lalu |
Olivier Chafik
|
f5cd27b71d
`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379)
|
7 bulan lalu |
Dorin-Andrei Geman
|
42158ae2e8
server : fix first message identification (#13634)
|
8 bulan lalu |
Diego Devesa
|
1d36b3670b
llama : move end-user examples to tools directory (#13249)
|
8 bulan lalu |