Georgi Gerganov
|
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181)
|
10 ماه پیش |
Olivier Chafik
|
669912d9a5
`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034)
|
10 ماه پیش |
Xuan-Son Nguyen
|
7b69003af7
webui : add ?m=... and ?q=... params (#12148)
|
10 ماه پیش |
Olivier Chafik
|
c7f460ab88
`server`: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless `--reasoning-format none` (#11607)
|
11 ماه پیش |
Olivier Chafik
|
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639)
|
11 ماه پیش |
Olivier Chafik
|
6171c9d258
Add Jinja template support (#11016)
|
1 سال پیش |
Georgi Gerganov
|
92bc493917
tests : increase timeout when sanitizers are enabled (#11300)
|
1 سال پیش |
Xuan Son Nguyen
|
f30f099228
server : implement cancellable request (#11285)
|
1 سال پیش |
Xuan Son Nguyen
|
0da5d86026
server : allow using LoRA adapters per-request (#10994)
|
1 سال پیش |
Xuan Son Nguyen
|
45095a61bf
server : clean up built-in template detection (#11026)
|
1 سال پیش |
Georgi Gerganov
|
152610eda9
server : output embeddings for all tokens when pooling = none (#10861)
|
1 سال پیش |
Yüg
|
a86ad841f1
server : add flag to disable the web-ui (#10762) (#10751)
|
1 سال پیش |
Xuan Son Nguyen
|
ce8784bdb1
server : fix format_infill (#10724)
|
1 سال پیش |
Xuan Son Nguyen
|
3573fa8e7b
server : (refactor) no more json in server_task input (#10691)
|
1 سال پیش |
Xuan Son Nguyen
|
b782e5c7d4
server : add more test cases (#10569)
|
1 سال پیش |
Xuan Son Nguyen
|
6c59567689
server : (tests) don't use thread for capturing stdout/stderr, bump openai client library (#10568)
|
1 سال پیش |
Xuan Son Nguyen
|
9f912511bc
common : fix duplicated file name with hf_repo and hf_file (#10550)
|
1 سال پیش |
Xuan Son Nguyen
|
45abe0f74e
server : replace behave with pytest (#10416)
|
1 سال پیش |