Georgi Gerganov
|
b52edd2558
server : remove n_past (#16818)
|
2 месяцев назад |
Pascal
|
12bbc3fa50
refactor: centralize CoT parsing in backend for streaming mode (#16394)
|
3 месяцев назад |
Georgi Gerganov
|
df1b612e29
server : add `/v1/health` endpoint (#16461)
|
3 месяцев назад |
Oleksandr Kuvshynov
|
c5fef0fcea
server: update readme to mention n_past_max metric (#16436)
|
3 месяцев назад |
Imad Saddik
|
2811c65286
Fixed a few typos in the README of the LLaMA.cpp HTTP Server [no ci] (#16297)
|
3 месяцев назад |
Adrien Gallouët
|
234e2ff8ed
server : remove old LLAMA_SERVER_SSL (#16290)
|
3 месяцев назад |
Xuan-Son Nguyen
|
61bdfd5298
server : implement prompt processing progress report in stream mode (#15827)
|
4 месяцев назад |
Georgi Gerganov
|
0d161f021a
server : enable /slots by default and make it secure (#15630)
|
4 месяцев назад |
Sergey Alirzaev
|
d82f6aa34a
server : removed obsolete doc (#15670)
|
4 месяцев назад |
ExtReMLapin
|
792b44f2ed
server : add documentation for `parallel_tool_calls` param (#15647)
|
4 месяцев назад |
Georgi Gerganov
|
9ebebef62f
llama : remove KV cache defragmentation logic (#15473)
|
4 месяцев назад |
65a
|
4afb0a746f
server : Support multimodal completion and embeddings prompts in JSON format (#15108)
|
4 месяцев назад |
Xuan-Son Nguyen
|
53d0a12658
server : allow specifying reasoning_format in HTTP request (#15238)
|
5 месяцев назад |
Lukas Straub
|
a9f77a8be3
server : add openai-style logit_bias support (#14946)
|
5 месяцев назад |
Daniel Bevenius
|
41e78c567e
server : add support for `embd_normalize` parameter (#14964)
|
5 месяцев назад |
IsaacDynamo
|
b4efd77f8a
server : add parse_special option to /tokenize endpoint (#14783)
|
6 месяцев назад |
Johannes Gäßler
|
5cae766541
scripts: synthetic prompt mode for server-bench.py (#14695)
|
6 месяцев назад |
matteo
|
caf5681fcb
server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196)
|
6 месяцев назад |
Nigel Bosch
|
1b809cee22
server : move no API key doc to /health (#14352)
|
6 месяцев назад |
aa956
|
d67341dc18
server : add server parameters for draft model cache type (#13782)
|
7 месяцев назад |
Olivier Chafik
|
e121edc432
`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771)
|
7 месяцев назад |
Isaac McFadyen
|
6a2bc8bfb7
server : added --no-prefill-assistant flag (#13608)
|
8 месяцев назад |
Georgi Gerganov
|
053174436f
server : passthrough the /models endpoint during loading (#13535)
|
8 месяцев назад |
Xuan-Son Nguyen
|
3b24d26c22
server : update docs (#13432)
|
8 месяцев назад |
Xuan-Son Nguyen
|
33eff40240
server : vision support via libmtmd (#12898)
|
8 месяцев назад |
Diego Devesa
|
1d36b3670b
llama : move end-user examples to tools directory (#13249)
|
8 месяцев назад |