shun095
|
f432d8d83e
chat: Fix streaming parser for granite models (#15682)
|
4 mesi fa |
Xuan-Son Nguyen
|
4b8560ab56
chat : fix build on arm64 (#16101)
|
4 mesi fa |
Jesse
|
88021565f0
chat : Deepseek V3.1 reasoning and tool calling support (OpenAI Style) (#15533)
|
4 mesi fa |
Gabe Goodhart
|
5fac79cbc7
Thinking model disabled assistant prefill (#15404)
|
4 mesi fa |
ExtReMLapin
|
4fd1242bef
chat : fixed crash when Hermes 2 <tool_call> had a newline before it (#15639)
|
4 mesi fa |
Piotr Wilkin (ilintar)
|
b2426e469e
chat : nemotron thinking & toolcalling support (#15676)
|
4 mesi fa |
Piotr Wilkin (ilintar)
|
60e5eee31f
chat : Seed OSS thinking + tool call support (#15552)
|
4 mesi fa |
Aldehir Rojas
|
32732f2459
model : gpt-oss add response_format support (#15494)
|
4 mesi fa |
Daniel Bevenius
|
657b8a77bd
chat: handle gpt-oss return/end token inconsistency (#15421)
|
5 mesi fa |
Xuan-Son Nguyen
|
e9288e8869
chat : clarify the meaning of reasoning_format (#15408)
|
5 mesi fa |
Daniel Bevenius
|
5e6229a840
common : fix double bos, use common_chat_templates for add_bos and add_eos (#15326)
|
5 mesi fa |
Diego Devesa
|
f75b830647
chat : include kwargs in template example (#15309)
|
5 mesi fa |
Aldehir Rojas
|
b204a5a234
gpt-oss: implement harmony parsing (#15181)
|
5 mesi fa |
Xuan-Son Nguyen
|
fba5c0d680
chat : hotfix gpt-oss jinja raising an exception (#15243)
|
5 mesi fa |
Xuan-Son Nguyen
|
53d0a12658
server : allow specifying reasoning_format in HTTP request (#15238)
|
5 mesi fa |
Sachin Desai
|
3db4da56a5
chat : support Granite model reasoning and tool call (#14864)
|
5 mesi fa |
Georgi Gerganov
|
fd1234cb46
llama : add gpt-oss (#15091)
|
5 mesi fa |
Sigbjørn Skjæret
|
f324a3b715
chat : only remove double bos/eos if added (#15086)
|
5 mesi fa |
Jhen-Jie Hong
|
f738989dcb
chat : fix multiple tool_calls on hermes-2-pro (#14962)
|
5 mesi fa |
kallewoof
|
1a67fcc306
common : avoid logging partial messages (which can contain broken UTF-8 sequences) (#14937)
|
5 mesi fa |
matteo
|
caf5681fcb
server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196)
|
6 mesi fa |
Sigbjørn Skjæret
|
e434e69183
common : suggest --jinja when autodetection fails (#14222)
|
7 mesi fa |
Piotr
|
3cb203c89f
llama-chat : Do not throw when tool parsing fails (#14012)
|
7 mesi fa |
Olivier Chafik
|
c9bbc77931
`server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933)
|
7 mesi fa |
Georgi Gerganov
|
53f925074d
sync : vendor (#13901)
|
7 mesi fa |
Olivier Chafik
|
03f582ae8f
server: fix streaming crashes (#13786)
|
7 mesi fa |
Olivier Chafik
|
d74e94c1b3
`server`: fix format of streamed tool call deltas (diff name, fix id location) (#13800)
|
7 mesi fa |
Olivier Chafik
|
f13847cfb5
server: fix regression on streamed non-chat completion w/ stops (#13785)
|
7 mesi fa |
Olivier Chafik
|
e121edc432
`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771)
|
7 mesi fa |
Olivier Chafik
|
f5cd27b71d
`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379)
|
7 mesi fa |