Georgi Gerganov
|
190c4838bd
chat : reserve memory in compute_diffs and improve naming (#17729)
|
1 ヶ月 前 |
Aldehir Rojas
|
0a8026e768
common : introduce composable PEG parser combinators for chat parsing (#17136)
|
1 ヶ月 前 |
hksdpc255
|
1920345c3b
common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932)
|
1 ヶ月 前 |
Yuri Khrustalev
|
c053e18a66
chat: Add LFM2 tool handling (#16763)
|
2 ヶ月 前 |
Georgi Gerganov
|
d00cbea63c
server : host-memory prompt caching (#16391)
|
3 ヶ月 前 |
Pascal
|
128d522c04
chat : support Magistral thinking (#16413)
|
3 ヶ月 前 |
Piotr Wilkin (ilintar)
|
34fcc5a4ac
model : Apertus model implementation (#15852)
|
3 ヶ月 前 |
Jesse
|
88021565f0
chat : Deepseek V3.1 reasoning and tool calling support (OpenAI Style) (#15533)
|
4 ヶ月 前 |
Gabe Goodhart
|
5fac79cbc7
Thinking model disabled assistant prefill (#15404)
|
4 ヶ月 前 |
Piotr Wilkin (ilintar)
|
b2426e469e
chat : nemotron thinking & toolcalling support (#15676)
|
4 ヶ月 前 |
Piotr Wilkin (ilintar)
|
60e5eee31f
chat : Seed OSS thinking + tool call support (#15552)
|
4 ヶ月 前 |
Diego Devesa
|
f75b830647
chat : include kwargs in template example (#15309)
|
5 ヶ月 前 |
Xuan-Son Nguyen
|
53d0a12658
server : allow specifying reasoning_format in HTTP request (#15238)
|
5 ヶ月 前 |
Sachin Desai
|
3db4da56a5
chat : support Granite model reasoning and tool call (#14864)
|
5 ヶ月 前 |
Georgi Gerganov
|
fd1234cb46
llama : add gpt-oss (#15091)
|
5 ヶ月 前 |
Sigbjørn Skjæret
|
f324a3b715
chat : only remove double bos/eos if added (#15086)
|
5 ヶ月 前 |
matteo
|
caf5681fcb
server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196)
|
6 ヶ月 前 |
Olivier Chafik
|
c9bbc77931
`server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933)
|
7 ヶ月 前 |
Olivier Chafik
|
03f582ae8f
server: fix streaming crashes (#13786)
|
7 ヶ月 前 |
Olivier Chafik
|
e121edc432
`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771)
|
7 ヶ月 前 |
Olivier Chafik
|
f5cd27b71d
`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379)
|
7 ヶ月 前 |
Olivier Chafik
|
aa48e373f2
`server`: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802)
|
8 ヶ月 前 |
Olivier Chafik
|
4e39a3c332
`server`: extract <think> tags from qwq outputs (#12297)
|
10 ヶ月 前 |
Olivier Chafik
|
63e489c025
tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900)
|
11 ヶ月 前 |