Daniel Bevenius
|
a18f481f99
server : use common_token_to_piece instead of common_detokenize (#11740)
|
11 mesi fa |
Xuan-Son Nguyen
|
0893e0114e
server : correct signal handler (#11795)
|
11 mesi fa |
Xuan-Son Nguyen
|
55ac8c7791
server : (webui) revamp Settings dialog, add Pyodide interpreter (#11759)
|
11 mesi fa |
Georgi Gerganov
|
aaa5505307
server : minor log updates (#11760)
|
11 mesi fa |
Xuan-Son Nguyen
|
3962fc1a79
server : add try..catch to places not covered by set_exception_handler (#11620)
|
11 mesi fa |
Olivier Chafik
|
bfcce4d693
`tool-call`: support Command R7B (+ return tool_plan "thoughts" in API) (#11585)
|
11 mesi fa |
Olivier Chafik
|
a83f528688
`tool-call`: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539)
|
11 mesi fa |
Olivier Chafik
|
5783575c9d
Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533)
|
11 mesi fa |
Daniel Bevenius
|
a2df2787b3
server : update help metrics processing/deferred (#11512)
|
11 mesi fa |
Olivier Chafik
|
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639)
|
11 mesi fa |
Daniel Bevenius
|
4314e56c4f
server : use lambda instead of std::bind (#11507)
|
11 mesi fa |
Nigel Bosch
|
eb7cf15a80
server : add /apply-template endpoint for additional use cases of Minja functionality (#11489)
|
11 mesi fa |
Daniel Bevenius
|
e51c47b401
server : update auto gen files comments [no ci] (#11484)
|
11 mesi fa |
Xuan Son Nguyen
|
49b0e3cec4
server : fix cleaning up stream task (#11418)
|
11 mesi fa |
Xuan Son Nguyen
|
5845661640
server : add more clean up when cancel_tasks is called (#11340)
|
11 mesi fa |
Diego Devesa
|
12c2bdf2de
server : fix draft context not being released (#11354)
|
11 mesi fa |
Jiří Podivín
|
96f4053934
Adding logprobs to /v1/completions (#11344)
|
1 anno fa |
Olivier Chafik
|
6171c9d258
Add Jinja template support (#11016)
|
1 anno fa |
Georgi Gerganov
|
80d0d6b4b7
common : add -hfd option for the draft model (#11318)
|
1 anno fa |
Xuan Son Nguyen
|
f30f099228
server : implement cancellable request (#11285)
|
1 anno fa |
Georgi Gerganov
|
afa8a9ec9b
llama : add `llama_vocab`, functions -> methods, naming (#11110)
|
1 anno fa |
Georgi Gerganov
|
e6e7c75d94
server : fix extra BOS in infill endpoint (#11106)
|
1 anno fa |
Georgi Gerganov
|
f66f582927
llama : refactor `src/llama.cpp` (#10902)
|
1 anno fa |
Xuan Son Nguyen
|
0da5d86026
server : allow using LoRA adapters per-request (#10994)
|
1 anno fa |
Xuan Son Nguyen
|
45095a61bf
server : clean up built-in template detection (#11026)
|
1 anno fa |
Xuan Son Nguyen
|
5896c65232
server : add OAI compat for /v1/completions (#10974)
|
1 anno fa |
Alexey Parfenov
|
16cdce7b68
server : fix token duplication when streaming with stop strings (#10997)
|
1 anno fa |
Reza Kakhki
|
9ba399dfa7
server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967)
|
1 anno fa |
NeverLucky
|
09fe2e7613
server: allow filtering llama server response fields (#10940)
|
1 anno fa |
Xuan Son Nguyen
|
14b699ecde
server : fix missing model id in /model endpoint (#10957)
|
1 anno fa |