Georgi Gerganov
|
17304cbcc1
server : fix img token logs (#16595)
|
3 mēneši atpakaļ |
Georgi Gerganov
|
554fd578a5
server : fix mtmd checkpoints (#16591)
|
3 mēneši atpakaļ |
Georgi Gerganov
|
bc07349a7f
server : dynamic token limit for prompt cache (#16560)
|
3 mēneši atpakaļ |
Yann Follet
|
31d0ff1869
server / ranking : add sorting and management of top_n (#16403)
|
3 mēneši atpakaļ |
Georgi Gerganov
|
e60f01d941
server : fix division by zero when reporting stats (#16501)
|
3 mēneši atpakaļ |
Radoslav Gerganov
|
68ee98ae18
server : return HTTP 400 if prompt exceeds context length (#16486)
|
3 mēneši atpakaļ |
Radoslav Gerganov
|
cdb6da468c
server : log requests to /v1/completions (#16495)
|
3 mēneši atpakaļ |
Georgi Gerganov
|
d00cbea63c
server : host-memory prompt caching (#16391)
|
3 mēneši atpakaļ |
issixx
|
d2ee056e1d
server : fix cancel pending task (#16467)
|
3 mēneši atpakaļ |
Georgi Gerganov
|
7fdd16b432
server : improve context checkpoint logic (#16440)
|
3 mēneši atpakaļ |
Georgi Gerganov
|
df1b612e29
server : add `/v1/health` endpoint (#16461)
|
3 mēneši atpakaļ |
ddh0
|
f6dcda3900
server : context checkpointing for hybrid and recurrent models (#16382)
|
3 mēneši atpakaļ |
Isaac McFadyen
|
e0539eb6ae
webui: switch to hash-based routing (alternative of #16079) (#16157)
|
3 mēneši atpakaļ |
Douglas Hanley
|
b5bd037832
llama : add support for qwen3 reranker (#15824)
|
3 mēneši atpakaļ |
Benni
|
459c0c2c1a
server: fix SSE and OpenAI compatibility for error messages when streaming (#16109)
|
3 mēneši atpakaļ |
Radoslav Gerganov
|
2b6b55a59f
server : include usage statistics only when user request them (#16052)
|
4 mēneši atpakaļ |
Aleksander Grygier
|
a7a98e0fff
SvelteKit-based WebUI (#14839)
|
4 mēneši atpakaļ |
Sigbjørn Skjæret
|
6c019cb04e
server : only attempt to enable thinking if using jinja (#15967)
|
4 mēneši atpakaļ |
Georgi Gerganov
|
f088b6a84f
server : adjust prompt similarity thold + add logs (#15913)
|
4 mēneši atpakaļ |
Xuan-Son Nguyen
|
56920f5665
server : bring back timings_per_token (#15879)
|
4 mēneši atpakaļ |
Xuan-Son Nguyen
|
61bdfd5298
server : implement prompt processing progress report in stream mode (#15827)
|
4 mēneši atpakaļ |
Gabe Goodhart
|
fd621880f3
aLoRA Support (#15327)
|
4 mēneši atpakaļ |
Gabe Goodhart
|
5fac79cbc7
Thinking model disabled assistant prefill (#15404)
|
4 mēneši atpakaļ |
Xuan-Son Nguyen
|
a68d914426
server: add exceed_context_size_error type (#15780)
|
4 mēneši atpakaļ |
Georgi Gerganov
|
e92d53b29e
sampling : optimize samplers by reusing bucket sort (#15665)
|
4 mēneši atpakaļ |
Georgi Gerganov
|
0d161f021a
server : enable /slots by default and make it secure (#15630)
|
4 mēneši atpakaļ |
Sigbjørn Skjæret
|
84ab83cc0b
model : jina-embeddings-v3 support (#13693)
|
4 mēneši atpakaļ |
65a
|
4afb0a746f
server : Support multimodal completion and embeddings prompts in JSON format (#15108)
|
4 mēneši atpakaļ |
teo
|
1bc664a26a
server: fix OpenAI API compatibility for usage statistics in chat streams (#15444)
|
4 mēneši atpakaļ |
davidef
|
d1d8241600
server : fix incoming tasks not process in order (#15395)
|
5 mēneši atpakaļ |