Georgi Gerganov
|
7b50d589a8
kv-cells : fix tracking of seq_pos (#14339)
|
6 ay önce |
Georgi Gerganov
|
4c9fdfbe15
ubatch : new splitting logic (#14217)
|
7 ay önce |
aa956
|
d67341dc18
server : add server parameters for draft model cache type (#13782)
|
7 ay önce |
Georgi Gerganov
|
89fea80d29
server : fix incorrect usage of llama_get_embeddings() (#14225)
|
7 ay önce |
Georgi Gerganov
|
d3e64b9f49
llama : rework embeddings logic (#14208)
|
7 ay önce |
Eric Curtin
|
cd355eda7d
server : When listening on a unix domain socket don't print http:// and port (#14180)
|
7 ay önce |
Georgi Gerganov
|
ffad043973
server : fix SWA condition for full context reprocess (#14163)
|
7 ay önce |
Georgi Gerganov
|
7d516443dd
server : re-enable SWA speculative decoding (#14131)
|
7 ay önce |
Taylor
|
2baf07727f
server : pass default --keep argument (#14120)
|
7 ay önce |
Juk Armstrong
|
3a12db23b6
Fixed spec timings to: accepted/tested instead of accepted/drafted (#14104)
|
7 ay önce |
Georgi Gerganov
|
87d34b381d
server : fix LRU check (#14079)
|
7 ay önce |
Georgi Gerganov
|
745aa5319b
llama : deprecate llama_kv_self_ API (#14030)
|
7 ay önce |
Georgi Gerganov
|
3637576288
server : disable speculative decoding for SWA models (#13970)
|
7 ay önce |
Olivier Chafik
|
c9bbc77931
`server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933)
|
7 ay önce |
Georgi Gerganov
|
3600cc2886
llama : use n_swa + n_ubatch cells for SWA cache (#13833)
|
7 ay önce |
Georgi Gerganov
|
3f55f781f1
llama : auto-batch preparation (#13845)
|
7 ay önce |
Georgi Gerganov
|
12d0188c0d
kv-cache : refactor + add llama_memory_state_i (#13746)
|
7 ay önce |
Georgi Gerganov
|
53f925074d
sync : vendor (#13901)
|
7 ay önce |
Xuan-Son Nguyen
|
10961339b2
mtmd : move helpers to dedicated library (⚠️ breaking change) (#13866)
|
7 ay önce |
Olivier Chafik
|
03f582ae8f
server: fix streaming crashes (#13786)
|
7 ay önce |
Georgi Gerganov
|
79c137f776
examples : allow extracting embeddings from decoder contexts (#13797)
|
7 ay önce |
Olivier Chafik
|
e121edc432
`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771)
|
7 ay önce |
Olivier Chafik
|
f5cd27b71d
`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379)
|
7 ay önce |
Xuan-Son Nguyen
|
9ecf3e66a3
server : support audio input (#13714)
|
7 ay önce |
Georgi Gerganov
|
cc74d5be99
server : pad small embedding batches (#13692)
|
8 ay önce |
Georgi Gerganov
|
5fbfe384d4
server : improve error reporting (#13680)
|
8 ay önce |
Robin Davidsson
|
0d5c742161
server : Add the endpoints /api/tags and /api/chat (#13659)
|
8 ay önce |
Dorin-Andrei Geman
|
42158ae2e8
server : fix first message identification (#13634)
|
8 ay önce |
Georgi Gerganov
|
797f2ac062
kv-cache : simplify the interface (#13660)
|
8 ay önce |
Georgi Gerganov
|
e298d2fbd0
kv-cache : add SWA support (#13194)
|
8 ay önce |