matteo
|
caf5681fcb
server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196)
|
vor 6 Monaten |
Sigbjørn Skjæret
|
88fc854b4b
llama : improve sep token handling (#14272)
|
vor 7 Monaten |
aa956
|
d67341dc18
server : add server parameters for draft model cache type (#13782)
|
vor 7 Monaten |
Georgi Gerganov
|
d3e64b9f49
llama : rework embeddings logic (#14208)
|
vor 7 Monaten |
Olivier Chafik
|
c9bbc77931
`server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933)
|
vor 7 Monaten |
Olivier Chafik
|
cdf94a1802
server: --offline mode (#13804)
|
vor 7 Monaten |
Olivier Chafik
|
e121edc432
`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771)
|
vor 7 Monaten |
Olivier Chafik
|
f5cd27b71d
`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379)
|
vor 7 Monaten |
Xuan-Son Nguyen
|
797990c4bc
mtmd : add ultravox audio input (#13623)
|
vor 7 Monaten |
Georgi Gerganov
|
a4090d1174
llama : remove llama_kv_cache_view API + remove deprecated (#13653)
|
vor 8 Monaten |
Georgi Gerganov
|
e298d2fbd0
kv-cache : add SWA support (#13194)
|
vor 8 Monaten |
psocolovsky
|
1dfbf2cf3a
common : add load_progress_callback (#13617)
|
vor 8 Monaten |
Isaac McFadyen
|
6a2bc8bfb7
server : added --no-prefill-assistant flag (#13608)
|
vor 8 Monaten |
Olivier Chafik
|
3198405e98
`common`: add partial regex support (#12808)
|
vor 8 Monaten |
Johannes Gäßler
|
10d2af0eaa
llama/ggml: add LLM training support (#10544)
|
vor 8 Monaten |
David Huang
|
7f323a589f
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)
|
vor 8 Monaten |
Bartowski
|
efb8b47eda
imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389)
|
vor 8 Monaten |
Georgi Gerganov
|
51fb96b1ff
context : remove logits_all flag (#13284)
|
vor 8 Monaten |
Georgi Gerganov
|
4773d7a02f
examples : remove infill (#13283)
|
vor 8 Monaten |
oobabooga
|
233461f812
sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (#13264)
|
vor 8 Monaten |
Xuan-Son Nguyen
|
9b61acf060
mtmd : rename llava directory to mtmd (#13311)
|
vor 8 Monaten |
Diego Devesa
|
1d36b3670b
llama : move end-user examples to tools directory (#13249)
|
vor 8 Monaten |
Xuan-Son Nguyen
|
7c727fbe39
arg : add --no-mmproj-offload (#13093)
|
vor 8 Monaten |
Xuan-Son Nguyen
|
80982e815e
arg : clean up handling --mmproj with -hf (#13082)
|
vor 8 Monaten |
tastelikefeet
|
b2034c2b55
contrib: support modelscope community (#12664)
|
vor 9 Monaten |
Diego Devesa
|
e0e912f49b
llama : add option to override model tensor buffers (#11397)
|
vor 9 Monaten |
Xuan-Son Nguyen
|
42eb248f46
common : remove json.hpp from common.cpp (#12697)
|
vor 9 Monaten |
Xuan-Son Nguyen
|
267c1399f1
common : refactor downloading system, handle mmproj with -hf option (#12694)
|
vor 9 Monaten |
marcoStocchi
|
6ef79a67ca
common : refactor '-o' option (#12278)
|
vor 10 Monaten |
Olivier Chafik
|
669912d9a5
`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034)
|
vor 10 Monaten |