Tarek Dakhran
|
aeaf8a36f0
llama : support LiquidAI LFM2-MoE hybrid model (#16464)
|
3 месяцев назад |
Georgi Gerganov
|
df1b612e29
server : add `/v1/health` endpoint (#16461)
|
3 месяцев назад |
Sascha Rogmann
|
4e0388aa8a
webui : added download action (#13552) (#16282)
|
3 месяцев назад |
Georgi Gerganov
|
ef4c5b87ea
presets : fix pooling param for embedding models (#16455)
|
3 месяцев назад |
Radoslav Gerganov
|
c61ae20d05
rpc : update documentation (#16441)
|
3 месяцев назад |
Georgi Gerganov
|
0123ff38f5
memory : use sequential equal splits for recurrent modules (#16442)
|
3 месяцев назад |
Georgi Gerganov
|
0a319bb75e
metal : add support for non-padded FA KV (#16148)
|
3 месяцев назад |
Georgi Gerganov
|
1d6092fc72
tests : add -INF blocks to the KQ mask in the FA tests (#16380)
|
3 месяцев назад |
Georgi Gerganov
|
8ae32dc9ec
metal : various optimizations + refactoring (#16446)
|
3 месяцев назад |
Gadflyii
|
3df2244df4
llama : add --no-host to disable host buffers (#16310)
|
3 месяцев назад |
Gabe Goodhart
|
c08002a198
chat : Granite Docling stopping (#16438)
|
3 месяцев назад |
Sigbjørn Skjæret
|
3a002afafa
ci : refactor sdk caching to minimize storage (#16414)
|
3 месяцев назад |
Georgi Gerganov
|
a23b9bdbd3
ggml : fix unaligned access in AMX code (#16315)
|
3 месяцев назад |
Daniel Bevenius
|
04e632a4aa
ci : remove missing reranker model files (#16444)
|
3 месяцев назад |
Daniel Bevenius
|
a80ff183ab
ggml-cpu : fix leftover handling in ggml_vec_scale_f32 for SVE (#16443)
|
3 месяцев назад |
Yuannan
|
1d49ca3759
nix : removed metal for nix (#16118)
|
3 месяцев назад |
Oleksandr Kuvshynov
|
c5fef0fcea
server: update readme to mention n_past_max metric (#16436)
|
3 месяцев назад |
Gabe Goodhart
|
ca71fb9b36
model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206)
|
3 месяцев назад |
Reese Levine
|
35266573b9
ggml webgpu: actually add softmax, fix rms_norm offset (#16400)
|
3 месяцев назад |
Eve
|
86df2c9ae4
vulkan: use a more appropriate amount of threads when generating shaders (#16418)
|
3 месяцев назад |
Radoslav Gerganov
|
f39283960b
rpc : check src buffer when copying tensor (#16421)
|
3 месяцев назад |
Radoslav Gerganov
|
898acba681
rpc : add support for multiple devices (#16276)
|
3 месяцев назад |
Acly
|
e29acf74fe
vulkan : incremental shader builds (#16341)
|
3 месяцев назад |
Pascal
|
128d522c04
chat : support Magistral thinking (#16413)
|
3 месяцев назад |
ddh0
|
f6dcda3900
server : context checkpointing for hybrid and recurrent models (#16382)
|
3 месяцев назад |
Georgi Gerganov
|
606a73f531
metal : fix loop bound in ggml_mem_ranges (#16412)
|
3 месяцев назад |
Sigbjørn Skjæret
|
946f71ed9a
llama : fix shapes for bert/mpt q/k norm (#16409)
|
3 месяцев назад |
Acly
|
638d330246
ggml : fix graph reallocation with multiple chunks (#16396)
|
3 месяцев назад |
Aleksander Grygier
|
84c8e305e8
Fix missing messages on sibling navigation (#16408)
|
3 месяцев назад |
Jeff Bolz
|
2aaf0a2a20
vulkan: Replace uses of maxMemoryAllocationSize and VK_WHOLE_SIZE (#16354)
|
3 месяцев назад |