o7si
|
d0a6a31470
model : add support for JinaBertModel with non-gated ffn (#18475)
|
3 週間 前 |
o7si
|
2b2afade9f
convert : fix encoding of WPM vocab for BERT models (#18500)
|
3 週間 前 |
HelloKS
|
f4f5019254
model: add Solar Open model (#18511)
|
3 週間 前 |
Anri Lombard
|
d5574c919c
webui: fix code copy stripping XML/HTML tags (#18518)
|
3 週間 前 |
Aman Gupta
|
26831bded9
ggml-cuda: remove unneccesary prints on ggml_cuda_init (#18502)
|
3 週間 前 |
Jeff Bolz
|
be47fb9285
vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295)
|
3 週間 前 |
triplenom
|
9e10bd2eaf
llama: handle short reads in direct I/O path (#18504)
|
3 週間 前 |
Anri Lombard
|
4cd162a123
chat: make tool description and parameters optional per OpenAI spec (#18478)
|
3 週間 前 |
Georgi Gerganov
|
13814eb370
sync : ggml
|
4 週間 前 |
Georgi Gerganov
|
54f67b9b66
ggml : bump version to 0.9.5 (ggml/1410)
|
4 週間 前 |
Anri Lombard
|
33ded988ba
quantize: prevent input/output file collision (#18451)
|
4 週間 前 |
Sigbjørn Skjæret
|
0db8109849
convert : lint fix (#18507)
|
4 週間 前 |
Henry147147
|
9b8329de7a
mtmd : Adding support for Nvidia Music Flamingo Model (#18470)
|
4 週間 前 |
gatbontonpc
|
9a6369bb60
metal : add count_equal op (#18314)
|
4 週間 前 |
Johannes Gäßler
|
ecc343de63
CUDA: fix KQ max calculation (#18487)
|
4 週間 前 |
Georgi Gerganov
|
01ade96e71
metal : remove BF16 x F16 kernels (#18456)
|
4 週間 前 |
Aman Gupta
|
7bcaf815c2
sycl: add newline at the end of CMakeLists.txt (#18503)
|
4 週間 前 |
Rahul Sathe
|
c8a3798041
Work around broken IntelSYCLConfig.cmake in Intel oneAPI 2025.x (#18345)
|
4 週間 前 |
Sigbjørn Skjæret
|
4849661d98
docker : add CUDA 13.1 image build (#18441)
|
4 週間 前 |
Bart Louwers
|
6e0c8cbc40
docs : document that JSON Schema is not available to model when using response_format (#18492)
|
4 週間 前 |
Aldehir Rojas
|
0f89d2ecf1
common : default content to an empty string (#18485)
|
4 週間 前 |
Daniel Bevenius
|
ac1d0eb7bf
llama : fix typo in comment in llama-kv-cache.h [no ci] (#18489)
|
4 週間 前 |
Xuan-Son Nguyen
|
cd78e57c3a
lora: count lora nodes in graph_max_nodes (#18469)
|
4 週間 前 |
Jay Zenith
|
c32fa21db8
sampling: reuse token data buffer in llama_sampler_sample (#18365)
|
4 週間 前 |
Jeff Bolz
|
f14f4e421b
server: fix files built redundantly (#18474)
|
4 週間 前 |
Charles Xu
|
2d6c00a9b8
kleidiai: add and integrate SVE 256-bit vector-length kernel (#18458)
|
4 週間 前 |
Aman Gupta
|
d77d7c5c06
CUDA: add log line when mxfp4 acceleration is used (#18483)
|
4 週間 前 |
Daniel Bevenius
|
a864fb1c14
model-conversion : use CONVERTED_MODEL for compare-embeddings (#18461)
|
4 週間 前 |
Xuan-Son Nguyen
|
51a48720b8
webui: fix prompt progress ETA calculation (#18468)
|
4 週間 前 |
Pascal
|
c9a3b40d65
Webui/prompt processing progress (#18300)
|
4 週間 前 |