Sigbjørn Skjæret
|
6c019cb04e
server : only attempt to enable thinking if using jinja (#15967)
|
4 месяцев назад |
Georgi Gerganov
|
f088b6a84f
server : adjust prompt similarity thold + add logs (#15913)
|
4 месяцев назад |
Xuan-Son Nguyen
|
56920f5665
server : bring back timings_per_token (#15879)
|
4 месяцев назад |
Xuan-Son Nguyen
|
61bdfd5298
server : implement prompt processing progress report in stream mode (#15827)
|
4 месяцев назад |
Gabe Goodhart
|
fd621880f3
aLoRA Support (#15327)
|
4 месяцев назад |
Gabe Goodhart
|
5fac79cbc7
Thinking model disabled assistant prefill (#15404)
|
4 месяцев назад |
Xuan-Son Nguyen
|
a68d914426
server: add exceed_context_size_error type (#15780)
|
4 месяцев назад |
Georgi Gerganov
|
e92d53b29e
sampling : optimize samplers by reusing bucket sort (#15665)
|
4 месяцев назад |
Georgi Gerganov
|
0d161f021a
server : enable /slots by default and make it secure (#15630)
|
4 месяцев назад |
Sigbjørn Skjæret
|
84ab83cc0b
model : jina-embeddings-v3 support (#13693)
|
4 месяцев назад |
65a
|
4afb0a746f
server : Support multimodal completion and embeddings prompts in JSON format (#15108)
|
4 месяцев назад |
teo
|
1bc664a26a
server: fix OpenAI API compatibility for usage statistics in chat streams (#15444)
|
4 месяцев назад |
davidef
|
d1d8241600
server : fix incoming tasks not process in order (#15395)
|
5 месяцев назад |
Oleksandr Kuvshynov
|
e5155e6986
server : export max observed n_past value (#15361)
|
5 месяцев назад |
Diego Devesa
|
f75b830647
chat : include kwargs in template example (#15309)
|
5 месяцев назад |
Georgi Gerganov
|
d32e03f449
server : add SWA checkpoints (#15293)
|
5 месяцев назад |
Sigbjørn Skjæret
|
b3e16665e1
server : enable -td and -tbd parameters (#15172)
|
5 месяцев назад |
Copilot
|
d8914fc47e
common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-moe-draft parameters (#15191)
|
5 месяцев назад |
Xuan-Son Nguyen
|
53d0a12658
server : allow specifying reasoning_format in HTTP request (#15238)
|
5 месяцев назад |
Johannes Gäßler
|
f906275537
server: enable token array inputs for OAI API (#15001)
|
5 месяцев назад |
g2mt
|
94933c8c2e
server : implement universal assisted decoding (#12635)
|
5 месяцев назад |
Lukas Straub
|
a9f77a8be3
server : add openai-style logit_bias support (#14946)
|
5 месяцев назад |
Daniel Bevenius
|
41e78c567e
server : add support for `embd_normalize` parameter (#14964)
|
5 месяцев назад |
Molly Sophia
|
adef81781a
server : allow setting `--reverse-prompt` arg (#14799)
|
5 месяцев назад |
IsaacDynamo
|
b4efd77f8a
server : add parse_special option to /tokenize endpoint (#14783)
|
6 месяцев назад |
Georgi Gerganov
|
6ffd4e9c44
server : pre-calculate EOG logit biases (#14721)
|
6 месяцев назад |
Georgi Gerganov
|
538cc77f7f
server : fix handling of the ignore_eos flag (#14710)
|
6 месяцев назад |
Douglas Hanley
|
0c1df14b5f
server : fix pooled embedding output (#14645)
|
6 месяцев назад |
Alawode Oluwandabira
|
17a1f0d2d4
server: Add ability to mount server at prefix (#14544)
|
6 месяцев назад |
matteo
|
caf5681fcb
server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196)
|
6 месяцев назад |