Xuan-Son Nguyen
|
9e39a1e6a9
server: support load model on startup, support preset-only options (#18206)
|
vor 4 Wochen |
Pascal
|
14931a826e
arg: fix order to use short form before long form (#18196)
|
vor 4 Wochen |
Xuan-Son Nguyen
|
98c1c7a7bf
presets: refactor, allow cascade presets from different sources, add global section (#18169)
|
vor 4 Wochen |
Pascal
|
6ce3d85796
server: (webui) add --webui-config (#18028)
|
vor 1 Monat |
Xuan-Son Nguyen
|
7b1db3d3b7
arg: clarify auto kvu/np being set on server (#17997)
|
vor 1 Monat |
2114L3
|
5f5f9b4637
server: Update README.md incorrect argument (#18073)
|
vor 1 Monat |
Xuan-Son Nguyen
|
380b4c984e
common: support negated args (#17919)
|
vor 1 Monat |
Xuan-Son Nguyen
|
54a0fee4b7
arg: add -mm and -mmu as short form of --mmproj and --mmproj-url (#17958)
|
vor 1 Monat |
Pascal
|
f32ca51bfe
server: add presets (config) when using multiple models (#17859)
|
vor 1 Monat |
Xuan-Son Nguyen
|
37a4f63244
server : add development documentation (#17760)
|
vor 1 Monat |
Georgi Gerganov
|
2bc96931d2
server : make cache_reuse configurable per request (#17858)
|
vor 1 Monat |
Xuan-Son Nguyen
|
c42712b056
server: support multiple generations from one prompt (OAI "n" option) (#17775)
|
vor 1 Monat |
Xuan-Son Nguyen
|
ec18edfcba
server: introduce API for serving / loading / unloading multiple models (#17470)
|
vor 1 Monat |
Xuan-Son Nguyen
|
7733409734
common: improve verbosity level definitions (#17630)
|
vor 1 Monat |
Fredrik Hultin
|
ddf9f94389
server : add Anthropic Messages API support (#17570)
|
vor 1 Monat |
Xuan-Son Nguyen
|
e509411cf1
server: enable jinja by default, update docs (#17524)
|
vor 1 Monat |
Aidan
|
eeee367de5
server: fix correct time_ms calculation in prompt_progress (#17093)
|
vor 2 Monaten |
손희준
|
fd2f84f468
docs: Clarify the endpoint that webui uses (#17001)
|
vor 2 Monaten |
Georgi Gerganov
|
b52edd2558
server : remove n_past (#16818)
|
vor 2 Monaten |
Pascal
|
12bbc3fa50
refactor: centralize CoT parsing in backend for streaming mode (#16394)
|
vor 3 Monaten |
Georgi Gerganov
|
df1b612e29
server : add `/v1/health` endpoint (#16461)
|
vor 3 Monaten |
Oleksandr Kuvshynov
|
c5fef0fcea
server: update readme to mention n_past_max metric (#16436)
|
vor 3 Monaten |
Imad Saddik
|
2811c65286
Fixed a few typos in the README of the LLaMA.cpp HTTP Server [no ci] (#16297)
|
vor 3 Monaten |
Adrien Gallouët
|
234e2ff8ed
server : remove old LLAMA_SERVER_SSL (#16290)
|
vor 3 Monaten |
Xuan-Son Nguyen
|
61bdfd5298
server : implement prompt processing progress report in stream mode (#15827)
|
vor 4 Monaten |
Georgi Gerganov
|
0d161f021a
server : enable /slots by default and make it secure (#15630)
|
vor 4 Monaten |
Sergey Alirzaev
|
d82f6aa34a
server : removed obsolete doc (#15670)
|
vor 4 Monaten |
ExtReMLapin
|
792b44f2ed
server : add documentation for `parallel_tool_calls` param (#15647)
|
vor 4 Monaten |
Georgi Gerganov
|
9ebebef62f
llama : remove KV cache defragmentation logic (#15473)
|
vor 4 Monaten |
65a
|
4afb0a746f
server : Support multimodal completion and embeddings prompts in JSON format (#15108)
|
vor 4 Monaten |