Daniel Bevenius
|
5e6229a840
common : fix double bos, use common_chat_templates for add_bos and add_eos (#15326)
|
5 ماه پیش |
Diego Devesa
|
f75b830647
chat : include kwargs in template example (#15309)
|
5 ماه پیش |
Aldehir Rojas
|
b204a5a234
gpt-oss: implement harmony parsing (#15181)
|
5 ماه پیش |
Xuan-Son Nguyen
|
fba5c0d680
chat : hotfix gpt-oss jinja raising an exception (#15243)
|
5 ماه پیش |
Xuan-Son Nguyen
|
53d0a12658
server : allow specifying reasoning_format in HTTP request (#15238)
|
5 ماه پیش |
Sachin Desai
|
3db4da56a5
chat : support Granite model reasoning and tool call (#14864)
|
5 ماه پیش |
Georgi Gerganov
|
fd1234cb46
llama : add gpt-oss (#15091)
|
5 ماه پیش |
Sigbjørn Skjæret
|
f324a3b715
chat : only remove double bos/eos if added (#15086)
|
5 ماه پیش |
Jhen-Jie Hong
|
f738989dcb
chat : fix multiple tool_calls on hermes-2-pro (#14962)
|
5 ماه پیش |
kallewoof
|
1a67fcc306
common : avoid logging partial messages (which can contain broken UTF-8 sequences) (#14937)
|
5 ماه پیش |
matteo
|
caf5681fcb
server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196)
|
6 ماه پیش |
Sigbjørn Skjæret
|
e434e69183
common : suggest --jinja when autodetection fails (#14222)
|
7 ماه پیش |
Piotr
|
3cb203c89f
llama-chat : Do not throw when tool parsing fails (#14012)
|
7 ماه پیش |
Olivier Chafik
|
c9bbc77931
`server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933)
|
7 ماه پیش |
Georgi Gerganov
|
53f925074d
sync : vendor (#13901)
|
7 ماه پیش |
Olivier Chafik
|
03f582ae8f
server: fix streaming crashes (#13786)
|
7 ماه پیش |
Olivier Chafik
|
d74e94c1b3
`server`: fix format of streamed tool call deltas (diff name, fix id location) (#13800)
|
7 ماه پیش |
Olivier Chafik
|
f13847cfb5
server: fix regression on streamed non-chat completion w/ stops (#13785)
|
7 ماه پیش |
Olivier Chafik
|
e121edc432
`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771)
|
7 ماه پیش |
Olivier Chafik
|
f5cd27b71d
`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379)
|
7 ماه پیش |
Olivier Chafik
|
aa48e373f2
`server`: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802)
|
8 ماه پیش |
Xuan-Son Nguyen
|
8c83449cb7
server : (webui) revamp the input area, plus many small UI improvements (#13365)
|
8 ماه پیش |
Olivier Chafik
|
b6930ebc42
`tool-call`: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900)
|
9 ماه پیش |
Olivier Chafik
|
4e39a3c332
`server`: extract <think> tags from qwq outputs (#12297)
|
10 ماه پیش |
Olivier Chafik
|
87c2630546
allow missing content in message if tool_calls provided (#12293)
|
10 ماه پیش |
Olivier Chafik
|
669912d9a5
`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034)
|
10 ماه پیش |
Olivier Chafik
|
63e489c025
tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900)
|
11 ماه پیش |
Olivier Chafik
|
f355229692
server: fix type promotion typo causing crashes w/ --jinja w/o tools (#11880)
|
11 ماه پیش |
Olivier Chafik
|
c7f460ab88
`server`: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless `--reasoning-format none` (#11607)
|
11 ماه پیش |
Olivier Chafik
|
9f4cc8f8d3
`sync`: minja (#11641)
|
11 ماه پیش |