Pascal
|
8328fd4bae
No markdown in cot (#16483)
|
3 miesięcy temu |
Daniel Bevenius
|
56b4795842
model-conversion : add support for SentenceTransformers (#16387)
|
3 miesięcy temu |
sudhiarm
|
2c0d875ae6
ci: add ARM64 Kleidiai build and test support (#16462)
|
3 miesięcy temu |
Chenguang Li
|
aa4711d369
CANN: Improve ACL graph matching (#16166)
|
3 miesięcy temu |
Charles Xu
|
d80d6d2400
kleidiai: kernel interface refactoring (#16460)
|
3 miesięcy temu |
Neo Zhang Jianyu
|
b260213755
[SYCL] refactor soft_max, add soft_max_back (#16472)
|
3 miesięcy temu |
Saba Fallah
|
e08db42595
model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (#16367)
|
3 miesięcy temu |
Pascal
|
12bbc3fa50
refactor: centralize CoT parsing in backend for streaming mode (#16394)
|
3 miesięcy temu |
ai-fonsi
|
9d0882840e
Disable CUDA host buffers on integrated GPUs (#16308)
|
3 miesięcy temu |
issixx
|
d2ee056e1d
server : fix cancel pending task (#16467)
|
3 miesięcy temu |
Georgi Gerganov
|
b2c08c9ec4
metal : mark FA blocks (#16372)
|
3 miesięcy temu |
Georgi Gerganov
|
7fdd16b432
server : improve context checkpoint logic (#16440)
|
3 miesięcy temu |
Reese Levine
|
74b8fc17f9
ggml webgpu: profiling, CI updates, reworking of command submission (#16452)
|
3 miesięcy temu |
Tarek Dakhran
|
aeaf8a36f0
llama : support LiquidAI LFM2-MoE hybrid model (#16464)
|
3 miesięcy temu |
Georgi Gerganov
|
df1b612e29
server : add `/v1/health` endpoint (#16461)
|
3 miesięcy temu |
Sascha Rogmann
|
4e0388aa8a
webui : added download action (#13552) (#16282)
|
3 miesięcy temu |
Georgi Gerganov
|
ef4c5b87ea
presets : fix pooling param for embedding models (#16455)
|
3 miesięcy temu |
Radoslav Gerganov
|
c61ae20d05
rpc : update documentation (#16441)
|
3 miesięcy temu |
Georgi Gerganov
|
0123ff38f5
memory : use sequential equal splits for recurrent modules (#16442)
|
3 miesięcy temu |
Georgi Gerganov
|
0a319bb75e
metal : add support for non-padded FA KV (#16148)
|
3 miesięcy temu |
Georgi Gerganov
|
1d6092fc72
tests : add -INF blocks to the KQ mask in the FA tests (#16380)
|
3 miesięcy temu |
Georgi Gerganov
|
8ae32dc9ec
metal : various optimizations + refactoring (#16446)
|
3 miesięcy temu |
Gadflyii
|
3df2244df4
llama : add --no-host to disable host buffers (#16310)
|
3 miesięcy temu |
Gabe Goodhart
|
c08002a198
chat : Granite Docling stopping (#16438)
|
3 miesięcy temu |
Sigbjørn Skjæret
|
3a002afafa
ci : refactor sdk caching to minimize storage (#16414)
|
3 miesięcy temu |
Georgi Gerganov
|
a23b9bdbd3
ggml : fix unaligned access in AMX code (#16315)
|
3 miesięcy temu |
Daniel Bevenius
|
04e632a4aa
ci : remove missing reranker model files (#16444)
|
3 miesięcy temu |
Daniel Bevenius
|
a80ff183ab
ggml-cpu : fix leftover handling in ggml_vec_scale_f32 for SVE (#16443)
|
3 miesięcy temu |
Yuannan
|
1d49ca3759
nix : removed metal for nix (#16118)
|
3 miesięcy temu |
Oleksandr Kuvshynov
|
c5fef0fcea
server: update readme to mention n_past_max metric (#16436)
|
3 miesięcy temu |