Georgi Gerganov
|
85a7d8677b
memory : remove KV cache size padding (#16812)
|
2 месяцев назад |
Johannes Gäßler
|
0bf47a1dbb
server: add memory breakdown print (#16740)
|
2 месяцев назад |
matteo
|
8cf6b42d46
server : send partial stop string when <EOG> is reached (#15007)
|
2 месяцев назад |
Georgi Gerganov
|
17304cbcc1
server : fix img token logs (#16595)
|
3 месяцев назад |
Georgi Gerganov
|
554fd578a5
server : fix mtmd checkpoints (#16591)
|
3 месяцев назад |
Georgi Gerganov
|
bc07349a7f
server : dynamic token limit for prompt cache (#16560)
|
3 месяцев назад |
Yann Follet
|
31d0ff1869
server / ranking : add sorting and management of top_n (#16403)
|
3 месяцев назад |
Georgi Gerganov
|
e60f01d941
server : fix division by zero when reporting stats (#16501)
|
3 месяцев назад |
Radoslav Gerganov
|
68ee98ae18
server : return HTTP 400 if prompt exceeds context length (#16486)
|
3 месяцев назад |
Radoslav Gerganov
|
cdb6da468c
server : log requests to /v1/completions (#16495)
|
3 месяцев назад |
Georgi Gerganov
|
d00cbea63c
server : host-memory prompt caching (#16391)
|
3 месяцев назад |
issixx
|
d2ee056e1d
server : fix cancel pending task (#16467)
|
3 месяцев назад |
Georgi Gerganov
|
7fdd16b432
server : improve context checkpoint logic (#16440)
|
3 месяцев назад |
Georgi Gerganov
|
df1b612e29
server : add `/v1/health` endpoint (#16461)
|
3 месяцев назад |
ddh0
|
f6dcda3900
server : context checkpointing for hybrid and recurrent models (#16382)
|
3 месяцев назад |
Isaac McFadyen
|
e0539eb6ae
webui: switch to hash-based routing (alternative of #16079) (#16157)
|
3 месяцев назад |
Douglas Hanley
|
b5bd037832
llama : add support for qwen3 reranker (#15824)
|
3 месяцев назад |
Benni
|
459c0c2c1a
server: fix SSE and OpenAI compatibility for error messages when streaming (#16109)
|
3 месяцев назад |
Radoslav Gerganov
|
2b6b55a59f
server : include usage statistics only when user request them (#16052)
|
4 месяцев назад |
Aleksander Grygier
|
a7a98e0fff
SvelteKit-based WebUI (#14839)
|
4 месяцев назад |
Sigbjørn Skjæret
|
6c019cb04e
server : only attempt to enable thinking if using jinja (#15967)
|
4 месяцев назад |
Georgi Gerganov
|
f088b6a84f
server : adjust prompt similarity thold + add logs (#15913)
|
4 месяцев назад |
Xuan-Son Nguyen
|
56920f5665
server : bring back timings_per_token (#15879)
|
4 месяцев назад |
Xuan-Son Nguyen
|
61bdfd5298
server : implement prompt processing progress report in stream mode (#15827)
|
4 месяцев назад |
Gabe Goodhart
|
fd621880f3
aLoRA Support (#15327)
|
4 месяцев назад |
Gabe Goodhart
|
5fac79cbc7
Thinking model disabled assistant prefill (#15404)
|
4 месяцев назад |
Xuan-Son Nguyen
|
a68d914426
server: add exceed_context_size_error type (#15780)
|
4 месяцев назад |
Georgi Gerganov
|
e92d53b29e
sampling : optimize samplers by reusing bucket sort (#15665)
|
4 месяцев назад |
Georgi Gerganov
|
0d161f021a
server : enable /slots by default and make it secure (#15630)
|
4 месяцев назад |
Sigbjørn Skjæret
|
84ab83cc0b
model : jina-embeddings-v3 support (#13693)
|
4 месяцев назад |