Johannes Gäßler
|
b1f3a6e5db
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653)
|
1 month ago |
Aaron Teo
|
877566d512
llama: introduce support for model-embedded sampling parameters (#17120)
|
1 month ago |
Sigbjørn Skjæret
|
9008027aa3
hparams : add n_embd_inp() to support extended embed (#16928)
|
2 months ago |
Georgi Gerganov
|
16bcc1259d
kv-cache : pad the cache size to 256 for performance (#17046)
|
2 months ago |
Georgi Gerganov
|
cd5e3b5754
server : support unified cache across slots (#16736)
|
2 months ago |
Adrian Lundberg
|
76af40aaaa
docs: remove llama_sampler_accept reference in sampling sample usage (#16920)
|
2 months ago |
JJJYmmm
|
d261223d24
model: add support for qwen3vl series (#16780)
|
2 months ago |
Gadflyii
|
3df2244df4
llama : add --no-host to disable host buffers (#16310)
|
3 months ago |
ddh0
|
f6dcda3900
server : context checkpointing for hybrid and recurrent models (#16382)
|
3 months ago |
Johannes Gäßler
|
e789095502
llama: print memory breakdown on exit (#15860)
|
3 months ago |
Gabe Goodhart
|
fd621880f3
aLoRA Support (#15327)
|
4 months ago |
Georgi Gerganov
|
e92d53b29e
sampling : optimize samplers by reusing bucket sort (#15665)
|
4 months ago |
Johannes Gäßler
|
e81b8e4b7f
llama: use FA + max. GPU layers by default (#15434)
|
4 months ago |
Sigbjørn Skjæret
|
84ab83cc0b
model : jina-embeddings-v3 support (#13693)
|
4 months ago |
Georgi Gerganov
|
9ebebef62f
llama : remove KV cache defragmentation logic (#15473)
|
4 months ago |
Georgi Gerganov
|
cd36b5e5c7
llama : remove deprecated llama_kv_self API (#15472)
|
4 months ago |
Georgi Gerganov
|
715a6db02c
kv-cache : drop the "unified" prefix (#15467)
|
4 months ago |
Georgi Gerganov
|
d32e03f449
server : add SWA checkpoints (#15293)
|
5 months ago |
Jonathan Graehl
|
5cdb27e091
finetune: SGD optimizer, more CLI args (#13873)
|
5 months ago |
Georgi Gerganov
|
fd1234cb46
llama : add gpt-oss (#15091)
|
5 months ago |
Diego Devesa
|
d6818d06a6
llama : allow other bufts when overriding to CPU, add --no-repack option (#14990)
|
5 months ago |
Aman Gupta
|
8a4a856277
Add LLaDA 8b Diffusion model (#14771)
|
5 months ago |
Georgi Gerganov
|
e4868d16d2
context : perform output reorder lazily upon access after sync (#14853)
|
5 months ago |
Georgi Gerganov
|
01612b7409
llama : reuse compute graphs (#14482)
|
6 months ago |
Georgi Gerganov
|
225e7a1438
llama : add high-throughput mode (#14363)
|
6 months ago |
Aman Gupta
|
ab14019821
Support diffusion models: Add Dream 7B (#14644)
|
6 months ago |
Min-Hua
|
79e0b68c17
llama: add LLAMA_API to deprecated llama_kv_self_seq_div (#14708)
|
6 months ago |
Shunta Saito
|
68e37a61a7
model : add PLaMo-2 support (#14560)
|
6 months ago |
Georgi Gerganov
|
0d5375d54b
llama : move enum llama_vocab_pre_type to implementation (#14631)
|
6 months ago |
Xuan-Son Nguyen
|
8f22dc0a53
model : add hunyuan moe (#14425)
|
6 months ago |