Xuan-Son Nguyen
|
380b4c984e
common: support negated args (#17919)
|
1 ay önce |
Xuan-Son Nguyen
|
54a0fee4b7
arg: add -mm and -mmu as short form of --mmproj and --mmproj-url (#17958)
|
1 ay önce |
Pascal
|
f32ca51bfe
server: add presets (config) when using multiple models (#17859)
|
1 ay önce |
Xuan-Son Nguyen
|
37a4f63244
server : add development documentation (#17760)
|
1 ay önce |
Georgi Gerganov
|
2bc96931d2
server : make cache_reuse configurable per request (#17858)
|
1 ay önce |
Xuan-Son Nguyen
|
c42712b056
server: support multiple generations from one prompt (OAI "n" option) (#17775)
|
1 ay önce |
Xuan-Son Nguyen
|
ec18edfcba
server: introduce API for serving / loading / unloading multiple models (#17470)
|
1 ay önce |
Xuan-Son Nguyen
|
7733409734
common: improve verbosity level definitions (#17630)
|
1 ay önce |
Fredrik Hultin
|
ddf9f94389
server : add Anthropic Messages API support (#17570)
|
1 ay önce |
Xuan-Son Nguyen
|
e509411cf1
server: enable jinja by default, update docs (#17524)
|
1 ay önce |
Aidan
|
eeee367de5
server: fix correct time_ms calculation in prompt_progress (#17093)
|
2 ay önce |
손희준
|
fd2f84f468
docs: Clarify the endpoint that webui uses (#17001)
|
2 ay önce |
Georgi Gerganov
|
b52edd2558
server : remove n_past (#16818)
|
2 ay önce |
Pascal
|
12bbc3fa50
refactor: centralize CoT parsing in backend for streaming mode (#16394)
|
3 ay önce |
Georgi Gerganov
|
df1b612e29
server : add `/v1/health` endpoint (#16461)
|
3 ay önce |
Oleksandr Kuvshynov
|
c5fef0fcea
server: update readme to mention n_past_max metric (#16436)
|
3 ay önce |
Imad Saddik
|
2811c65286
Fixed a few typos in the README of the LLaMA.cpp HTTP Server [no ci] (#16297)
|
3 ay önce |
Adrien Gallouët
|
234e2ff8ed
server : remove old LLAMA_SERVER_SSL (#16290)
|
3 ay önce |
Xuan-Son Nguyen
|
61bdfd5298
server : implement prompt processing progress report in stream mode (#15827)
|
4 ay önce |
Georgi Gerganov
|
0d161f021a
server : enable /slots by default and make it secure (#15630)
|
4 ay önce |
Sergey Alirzaev
|
d82f6aa34a
server : removed obsolete doc (#15670)
|
4 ay önce |
ExtReMLapin
|
792b44f2ed
server : add documentation for `parallel_tool_calls` param (#15647)
|
4 ay önce |
Georgi Gerganov
|
9ebebef62f
llama : remove KV cache defragmentation logic (#15473)
|
4 ay önce |
65a
|
4afb0a746f
server : Support multimodal completion and embeddings prompts in JSON format (#15108)
|
4 ay önce |
Xuan-Son Nguyen
|
53d0a12658
server : allow specifying reasoning_format in HTTP request (#15238)
|
5 ay önce |
Lukas Straub
|
a9f77a8be3
server : add openai-style logit_bias support (#14946)
|
5 ay önce |
Daniel Bevenius
|
41e78c567e
server : add support for `embd_normalize` parameter (#14964)
|
5 ay önce |
IsaacDynamo
|
b4efd77f8a
server : add parse_special option to /tokenize endpoint (#14783)
|
6 ay önce |
Johannes Gäßler
|
5cae766541
scripts: synthetic prompt mode for server-bench.py (#14695)
|
6 ay önce |
matteo
|
caf5681fcb
server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196)
|
6 ay önce |