Radoslav Gerganov
|
68ee98ae18
server : return HTTP 400 if prompt exceeds context length (#16486)
|
3 bulan lalu |
Daniel Bevenius
|
d0991da39d
server : add support for external server for tests (#16243)
|
3 bulan lalu |
Xuan-Son Nguyen
|
3c3635d2f2
server : speed up tests (#15836)
|
4 bulan lalu |
Georgi Gerganov
|
0d161f021a
server : enable /slots by default and make it secure (#15630)
|
4 bulan lalu |
Johannes Gäßler
|
e81b8e4b7f
llama: use FA + max. GPU layers by default (#15434)
|
4 bulan lalu |
Johannes Gäßler
|
fbef0fad7a
server: higher timeout for tests (#15621)
|
4 bulan lalu |
teo
|
1bc664a26a
server: fix OpenAI API compatibility for usage statistics in chat streams (#15444)
|
4 bulan lalu |
Georgi Gerganov
|
d2fcd91cf9
server : disable context shift by default (#15416)
|
5 bulan lalu |
Olivier Chafik
|
c9bbc77931
`server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933)
|
7 bulan lalu |
Olivier Chafik
|
d74e94c1b3
`server`: fix format of streamed tool call deltas (diff name, fix id location) (#13800)
|
7 bulan lalu |
Olivier Chafik
|
e121edc432
`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771)
|
7 bulan lalu |
Olivier Chafik
|
f5cd27b71d
`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379)
|
7 bulan lalu |
Xuan-Son Nguyen
|
33eff40240
server : vision support via libmtmd (#12898)
|
8 bulan lalu |
Diego Devesa
|
1d36b3670b
llama : move end-user examples to tools directory (#13249)
|
8 bulan lalu |