Johannes Gäßler
|
b1f3a6e5db
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653)
|
hai 1 mes |
Georgi Gerganov
|
609a2d0268
models : fix YaRN regression + consolidate logic (#18006)
|
hai 1 mes |
Georgi Gerganov
|
7bed317f53
models : fix the attn_factor for mistral3 graphs + improve consistency (#17945)
|
hai 1 mes |
Georgi Gerganov
|
4dff236a52
ggml : remove GGML_KQ_MASK_PAD constant (#17910)
|
hai 1 mes |
JJJYmmm
|
d261223d24
model: add support for qwen3vl series (#16780)
|
hai 2 meses |
Xuan-Son Nguyen
|
e3af5563bd
llama: store mrope data in KV cell (#16825)
|
hai 2 meses |
Georgi Gerganov
|
85a7d8677b
memory : remove KV cache size padding (#16812)
|
hai 2 meses |
Johannes Gäßler
|
7a0e900e36
llama: consistent ctx <-> buf order for KV cache (#16746)
|
hai 2 meses |
Georgi Gerganov
|
d00cbea63c
server : host-memory prompt caching (#16391)
|
hai 3 meses |
Johannes Gäßler
|
e789095502
llama: print memory breakdown on exit (#15860)
|
hai 3 meses |
Georgi Gerganov
|
cf0e3ba150
model : avoid ggml_cont_3d for fused QKV weights (#15662)
|
hai 4 meses |
Georgi Gerganov
|
c610b6c11b
kv-cache : fix SWA checks + disable cacheless iSWA (#15811)
|
hai 4 meses |
Daniel Bevenius
|
fb15d649ed
llama : add support for EmbeddingGemma 300m (#15798)
|
hai 4 meses |
Georgi Gerganov
|
c8d0d14e77
kv-cache : fix find_slot to not search for continuous slot (#15638)
|
hai 4 meses |
Georgi Gerganov
|
8a4280ce43
kv-cache : remove LLAMA_SET_ROWS checks (#15505)
|
hai 4 meses |
Georgi Gerganov
|
1bded5a3b3
kv-cache : better estimate of n_kv for multi-sequence batches (#15610)
|
hai 4 meses |
Georgi Gerganov
|
b730706a49
kv-cache : support layer reuse (#15504)
|
hai 5 meses |
Georgi Gerganov
|
9ebebef62f
llama : remove KV cache defragmentation logic (#15473)
|
hai 5 meses |
Georgi Gerganov
|
715a6db02c
kv-cache : drop the "unified" prefix (#15467)
|
hai 5 meses |
Georgi Gerganov
|
7f37b6cf1e
memory : migrate from llama_kv_cache to more generic llama_memory (#14006)
|
hai 7 meses |
Georgi Gerganov
|
0fc16b42e8
kv-cache : split implementation in separate sources (#13920)
|
hai 7 meses |
Georgi Gerganov
|
3600cc2886
llama : use n_swa + n_ubatch cells for SWA cache (#13833)
|
hai 7 meses |
Georgi Gerganov
|
3f55f781f1
llama : auto-batch preparation (#13845)
|
hai 7 meses |
Georgi Gerganov
|
12d0188c0d
kv-cache : refactor + add llama_memory_state_i (#13746)
|
hai 7 meses |
Xuan-Son Nguyen
|
763d06edb7
llama : fix KV shift for qwen2vl (#13870)
|
hai 7 meses |
Georgi Gerganov
|
81713121ee
kv-cells : track min/max used cells and per-sequence positions (#13808)
|
hai 7 meses |
Georgi Gerganov
|
de2ef53a4b
kv-cache : rework kv_cell (#13706)
|
hai 8 meses |
Georgi Gerganov
|
797f2ac062
kv-cache : simplify the interface (#13660)
|
hai 8 meses |
Georgi Gerganov
|
a4090d1174
llama : remove llama_kv_cache_view API + remove deprecated (#13653)
|
hai 8 meses |
Georgi Gerganov
|
e298d2fbd0
kv-cache : add SWA support (#13194)
|
hai 8 meses |