Xuan-Son Nguyen
|
bc583e3c63
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) (#13784)
|
7 ماه پیش |
bandoti
|
72b090da2c
docs: remove link for llama-cli function calling (#13810)
|
7 ماه پیش |
Christian Kastner
|
7fe03e7446
ggml-cpu: x86 feature detection is specific to x86 (#13811)
|
7 ماه پیش |
Diego Devesa
|
952f3953c1
ggml : allow CUDA graphs when using pipeline parallelism (#13814)
|
7 ماه پیش |
Georgi Gerganov
|
81713121ee
kv-cells : track min/max used cells and per-sequence positions (#13808)
|
7 ماه پیش |
Georgi Gerganov
|
f9cd68398b
sampling : make sure samplers return at least 1 token (#13822)
|
7 ماه پیش |
Georgi Gerganov
|
4f81b33e32
llama : validate seq id batch input (#13809)
|
7 ماه پیش |
Olivier Chafik
|
cdf94a1802
server: --offline mode (#13804)
|
7 ماه پیش |
Georgi Gerganov
|
a26c4cc11e
scripts : add option to compare commits in Debug (#13806)
|
7 ماه پیش |
Georgi Gerganov
|
4265a87b59
cuda : avoid cuGetErrorString (#13791)
|
7 ماه پیش |
Akarshan Biswas
|
6f180b915c
SYCL: Add non contiguous support in RMS_NORM and NORM kernels (#13611)
|
7 ماه پیش |
Olivier Chafik
|
03f582ae8f
server: fix streaming crashes (#13786)
|
7 ماه پیش |
standby24x7
|
88c125f2ac
examples/training: Fix file name in README (#13803)
|
7 ماه پیش |
Olivier Chafik
|
d74e94c1b3
`server`: fix format of streamed tool call deltas (diff name, fix id location) (#13800)
|
7 ماه پیش |
Olivier Chafik
|
f13847cfb5
server: fix regression on streamed non-chat completion w/ stops (#13785)
|
7 ماه پیش |
Georgi Gerganov
|
79c137f776
examples : allow extracting embeddings from decoder contexts (#13797)
|
7 ماه پیش |
Georgi Gerganov
|
22229314fc
llama : clarify deprecation message (#13794)
|
7 ماه پیش |
Romain Biessy
|
9012eb9b45
sycl: Add more debug prints (#13640)
|
7 ماه پیش |
Jeff Bolz
|
fef693dc6b
vulkan: mark IM2COL as supporting non-contig (#13783)
|
7 ماه پیش |
Bizhao Shi
|
2d38b6e400
CANN: Add the basic supports of Flash Attention kernel (#13627)
|
7 ماه پیش |
Olivier Chafik
|
e121edc432
`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771)
|
7 ماه پیش |
Xuan-Son Nguyen
|
2f099b510f
webui : bump max upload file size to 500MB (#13779)
|
7 ماه پیش |
Sigbjørn Skjæret
|
aa50ba462f
tests : improve UGM tokenizer test coverage (#13773)
|
7 ماه پیش |
Georgi Gerganov
|
de2ef53a4b
kv-cache : rework kv_cell (#13706)
|
7 ماه پیش |
Percy Piper
|
c508256db2
rpc : Fix build on OpenBSD (#13541)
|
7 ماه پیش |
Xuan-Son Nguyen
|
40aaa8a403
mtmd : add support for Qwen2-Audio and SeaLLM-Audio (#13760)
|
7 ماه پیش |
ddpasa
|
a08c1d2845
docs : add Moondream2 pre-quantized link (#13745)
|
7 ماه پیش |
Olivier Chafik
|
d785f9c1fd
server: fix/test add_generation_prompt (#13770)
|
7 ماه پیش |
Piotr Jasiukajtis
|
4032ca4066
llama : add support for Qwen3 MoE tied word embeddings (#13768)
|
7 ماه پیش |
Akarshan Biswas
|
515fdbf7ed
SYCL: revert "sycl: simplify bin_bcast_kernel (#13383)" (#13752)
|
7 ماه پیش |