손희준
|
fbbf3ad190
server: /v1/responses (partial) (#18486)
|
1 week ago |
Xuan-Son Nguyen
|
6df686bee6
server : refactor oai_parser_opt, move it to server_chat_params (#18937)
|
1 week ago |
Lennart Austenfeld
|
18361c579c
server: fix memory reservations in populate_token_probs (#18787)
|
1 week ago |
Xuan-Son Nguyen
|
c15395f73c
common : implement new jinja template engine (#18462)
|
1 week ago |
Xuan-Son Nguyen
|
a04c2b06a3
server: improve slots scheduling for n_cmpl (#18789)
|
2 weeks ago |
Georgi Gerganov
|
39173bcacb
context : reserve new scheduler when graph topology changes (#18547)
|
2 weeks ago |
Xuan-Son Nguyen
|
9ac2693a30
server: fix n_cmpl not skipping processing prompt (#18663)
|
2 weeks ago |
Georgi Gerganov
|
53eb9435da
server : fix timing of prompt/generation (#18713)
|
2 weeks ago |
Georgi Gerganov
|
f5f8812f7c
server : use different seeds for child completions (#18700)
|
2 weeks ago |
Tarek Dakhran
|
73d284a250
model : add LFM2-ColBert-350M (#18607)
|
3 weeks ago |
Daniel Bevenius
|
d3dce4e0a5
sampling : add support for backend sampling (#17004)
|
3 weeks ago |
Georgi Gerganov
|
2a85f720b8
server : handle closed connection for tasks (#18459)
|
1 month ago |
o7si
|
4893cc07bb
server : fix crash when seq_rm fails for hybrid/recurrent models (#18391)
|
1 month ago |
Xuan-Son Nguyen
|
5ee4e43f26
server: return_progress to also report 0% processing state (#18305)
|
1 month ago |
Xuan-Son Nguyen
|
849d021104
server: fix crash with model not having BOS/EOS (#18321)
|
1 month ago |
Xuan-Son Nguyen
|
6ce863c803
server: prevent data race from HTTP threads (#18263)
|
1 month ago |
Xuan-Son Nguyen
|
ddcb75dd8a
server: add auto-sleep after N seconds of idle (#18228)
|
1 month ago |
Oleksandr Kuvshynov
|
408616adbd
server : [easy] fix per round speculative decode logging (#18211)
|
1 month ago |
Aman Gupta
|
cc0a04343e
server: friendlier error msg when ctx < input (#18174)
|
1 month ago |
Pascal
|
6ce3d85796
server: (webui) add --webui-config (#18028)
|
1 month ago |
Georgi Gerganov
|
254098a279
common : refactor common_sampler + grammar logic changes (#17937)
|
1 month ago |
Xuan-Son Nguyen
|
6c2131773c
cli: new CLI experience (#17824)
|
1 month ago |
Xuan-Son Nguyen
|
951520ddb0
server: delegate result_state creation to server_task (#17835)
|
1 month ago |
Xuan-Son Nguyen
|
f896d2c34f
server: improve speed of speculative decoding (#17808)
|
1 month ago |
Georgi Gerganov
|
2bc96931d2
server : make cache_reuse configurable per request (#17858)
|
1 month ago |
Xuan-Son Nguyen
|
c42712b056
server: support multiple generations from one prompt (OAI "n" option) (#17775)
|
1 month ago |
Xuan-Son Nguyen
|
c4c10bfb86
server: move msg diffs tracking to HTTP thread (#17740)
|
1 month ago |
Xuan-Son Nguyen
|
13628d8bdb
server: add --media-path for local media files (#17697)
|
1 month ago |
Xuan-Son Nguyen
|
5d6bd842ea
server: remove default "gpt-3.5-turbo" model name (#17668)
|
1 month ago |
Xuan-Son Nguyen
|
ecf74a8417
mtmd: add mtmd_context_params::warmup option (#17652)
|
1 month ago |