Daniel Bevenius
|
8e3ead6e4d
model-conversion : add device option to run-org-model.py (#18318)
|
1 month ago |
Chris Rohlf
|
12ee1763a6
rpc : add check for rpc buffer type (#18242)
|
1 month ago |
nullname
|
ed75977717
ggml-hexagon: create generalized functions for cpu side op (#17500)
|
1 month ago |
Daniel Bevenius
|
847c35f7d5
model-conversion : add trust_remote_code for embedding scripts (#18288)
|
1 month ago |
Neo Zhang
|
a6a552e4ec
[SYCL] replace llama-cli by llama-completion to rm the impact to test script (#18290)
|
1 month ago |
Alessandro98-git
|
96e33a814e
model : fix div-by-zero for Nemotron V2 (#18309)
|
1 month ago |
Ryan Mangeno
|
dfc959b886
model : Granite Embedding support (#15641)
|
1 month ago |
compilade
|
8f48807380
gguf-py : do not align the data start offset (#18291)
|
1 month ago |
Shouyu
|
bf6bc3c155
ggml-hexagon: gelu optimization (#18151)
|
1 month ago |
Xuan-Son Nguyen
|
179fd82a72
gen-docs: automatically update markdown file (#18294)
|
1 month ago |
Taimur Ahmad
|
d34d5ca1e9
llamafile: add rvv support for sgemm kernels (#18199)
|
1 month ago |
lhez
|
eb492bf43f
opencl: unpack q4_0 for adreno in get_tensor (#18278)
|
1 month ago |
Jeff Bolz
|
e3b35ddf1c
vulkan: Extend rope fusions to allow mrope (#18264)
|
1 month ago |
Xuan-Son Nguyen
|
6ce863c803
server: prevent data race from HTTP threads (#18263)
|
1 month ago |
Xuan-Son Nguyen
|
3997c78e33
server: fix data race in to_json_anthropic (#18283)
|
1 month ago |
Mattt
|
ee74642982
release: update release workflow to store XCFramework as Zip file (#18284)
|
1 month ago |
Aaron Teo
|
a28310488c
convert: rework ftype heuristics (#18214)
|
1 month ago |
Xuan-Son Nguyen
|
86af848153
server: (docs) remove mention about extra_args (#18262)
|
1 month ago |
Johannes Gäßler
|
147a521636
tool/ex/tests: consistently free ctx, then model (#18168)
|
1 month ago |
Jeff Bolz
|
e1f15b454f
vulkan: Implement set_tensor_async and the event interfaces (#18047)
|
1 month ago |
Johannes Gäßler
|
0e1ccf15c7
llama: fix RPC for -fit on (#18233)
|
1 month ago |
Xuan-Son Nguyen
|
5e25ddebff
move copilot instructions to AGENTS.md (#18259)
|
1 month ago |
Jeff Bolz
|
fd05c51cec
vulkan: fix im2col overflowing maxworkgroupcount (#18180)
|
1 month ago |
Jeff Bolz
|
b365c3ff01
vulkan/cuda: fix topk_moe with exp_probs_b (#18071)
|
1 month ago |
Jeff Bolz
|
cb64222b0c
vulkan: support GGML_UNARY_OP_XIELU (#18062)
|
1 month ago |
Jeff Bolz
|
6eb7081860
vulkan: in graph_optimize, try to group ADD operations (#18060)
|
1 month ago |
lovedheart
|
4117ae5557
Vulkan: some improvement on mul_mat_iq2_xs (#18031)
|
1 month ago |
Daniel Bevenius
|
65e96a2464
docs : fix links in parsing.md (#18245)
|
1 month ago |
Aldehir Rojas
|
9496bbb808
common : reorganize includes to prioritize vendored deps (#18222)
|
1 month ago |
Xuan-Son Nguyen
|
ddcb75dd8a
server: add auto-sleep after N seconds of idle (#18228)
|
1 month ago |