Xuan-Son Nguyen
|
86af848153
server: (docs) remove mention about extra_args (#18262)
|
1 month ago |
Johannes Gäßler
|
147a521636
tool/ex/tests: consistently free ctx, then model (#18168)
|
1 month ago |
Jeff Bolz
|
e1f15b454f
vulkan: Implement set_tensor_async and the event interfaces (#18047)
|
1 month ago |
Johannes Gäßler
|
0e1ccf15c7
llama: fix RPC for -fit on (#18233)
|
1 month ago |
Xuan-Son Nguyen
|
5e25ddebff
move copilot instructions to AGENTS.md (#18259)
|
1 month ago |
Jeff Bolz
|
fd05c51cec
vulkan: fix im2col overflowing maxworkgroupcount (#18180)
|
1 month ago |
Jeff Bolz
|
b365c3ff01
vulkan/cuda: fix topk_moe with exp_probs_b (#18071)
|
1 month ago |
Jeff Bolz
|
cb64222b0c
vulkan: support GGML_UNARY_OP_XIELU (#18062)
|
1 month ago |
Jeff Bolz
|
6eb7081860
vulkan: in graph_optimize, try to group ADD operations (#18060)
|
1 month ago |
lovedheart
|
4117ae5557
Vulkan: some improvement on mul_mat_iq2_xs (#18031)
|
1 month ago |
Daniel Bevenius
|
65e96a2464
docs : fix links in parsing.md (#18245)
|
1 month ago |
Aldehir Rojas
|
9496bbb808
common : reorganize includes to prioritize vendored deps (#18222)
|
1 month ago |
Xuan-Son Nguyen
|
ddcb75dd8a
server: add auto-sleep after N seconds of idle (#18228)
|
1 month ago |
Jeff Bolz
|
52ab19df63
tests: Avoid floating point precision false positives in SUM (#17471)
|
1 month ago |
Jeff Bolz
|
5182dd64cd
test-backend-ops: improve msvc build time (#18209)
|
1 month ago |
Aadeshveer Singh
|
10b4f82d44
Added comments explaining thread block size selection logic based on row count and column size, derived from historical commit context (#18212)
|
1 month ago |
Oleksandr Kuvshynov
|
408616adbd
server : [easy] fix per round speculative decode logging (#18211)
|
1 month ago |
Xuan-Son Nguyen
|
9e39a1e6a9
server: support load model on startup, support preset-only options (#18206)
|
1 month ago |
Sigbjørn Skjæret
|
74e05131e9
ci : remove non-windows zip artifacts (#18201)
|
1 month ago |
Sigbjørn Skjæret
|
f74747d886
ci : only save ccache on master (#18207)
|
1 month ago |
Alfred
|
ce734a8a2f
ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (#17977)
|
1 month ago |
Pascal
|
14931a826e
arg: fix order to use short form before long form (#18196)
|
1 month ago |
Julius Tischbein
|
f99ef53d2a
llama : Changing off_t to size_t for Windows (#18204)
|
1 month ago |
Aman Gupta
|
cc0a04343e
server: friendlier error msg when ctx < input (#18174)
|
1 month ago |
Xuan-Son Nguyen
|
98c1c7a7bf
presets: refactor, allow cascade presets from different sources, add global section (#18169)
|
1 month ago |
Aleksander Grygier
|
acb73d8340
webui: Add editing attachments in user messages (#18147)
|
1 month ago |
Daniel Bevenius
|
0a271d82b4
model-conversion : add verbose flag in run-org-model.py (#18194)
|
1 month ago |
Naco Siren
|
52fc7fee8a
android: fix missing screenshots for Android.md (#18156)
|
1 month ago |
Jeff Bolz
|
cdbada8d10
vulkan: Add perf logger mode with concurrency (#17944)
|
1 month ago |
Xuan-Son Nguyen
|
8ea958d4d9
model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106)
|
1 month ago |