Giuseppe Scrivano
|
f90b4a8efe
vulkan: delete dead code (#16732)
|
3 kuukautta sitten |
Jeff Bolz
|
8423d01931
vulkan: Optimize SSM_SCAN (#16645)
|
3 kuukautta sitten |
compilade
|
5cca2542ac
convert : avoid dequantizing mxfp4 for GPT-OSS (#16756)
|
3 kuukautta sitten |
leejet
|
55945d2ef5
ggml: fix CUDA grid launch condition for large block_nums.y in binbcast (#16742)
|
3 kuukautta sitten |
Aman Gupta
|
0bcb40b48c
CUDA: use CUB for arbitary size argsort (#16754)
|
3 kuukautta sitten |
Florian Badie
|
69e9ff0103
webui: support q URL parameter (#16728)
|
3 kuukautta sitten |
Daniel Bevenius
|
5a91109a5d
model-conversion : add trust_remote_code for orig model run [no ci] (#16751)
|
3 kuukautta sitten |
compilade
|
f8f071fadd
convert : handle pre-quantized models (#14810)
|
3 kuukautta sitten |
Johannes Gäßler
|
0bf47a1dbb
server: add memory breakdown print (#16740)
|
3 kuukautta sitten |
Julien Denize
|
dd62dcfab9
convert : Make mistral-common dependency optional (#16738)
|
3 kuukautta sitten |
Xuan-Son Nguyen
|
d0660f237a
mtmd-cli : allow using --jinja (#16718)
|
3 kuukautta sitten |
Prajwal B Mehendarkar
|
fe6a9882ac
Manually link -lbsd to resolve flock symbol on AIX (#16610)
|
3 kuukautta sitten |
Aman Gupta
|
061f0eff02
ggml-cuda: use passed ops instead of hardcoded ops (#16712)
|
3 kuukautta sitten |
matteo
|
8cf6b42d46
server : send partial stop string when <EOG> is reached (#15007)
|
3 kuukautta sitten |
Matthew Michel
|
9de9672adb
sycl: use async memory allocation to fix crashes during graph recording (#16644)
|
3 kuukautta sitten |
Max Krasnyansky
|
63d2fc46e1
Add experimental ggml-hexagon backend for the Hexagon NPU (#16547)
|
3 kuukautta sitten |
Diego Devesa
|
a2e0088d92
Revert "ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_v…" (#16723)
|
3 kuukautta sitten |
Pascal
|
9b9201f65a
webui: introduce OpenAI-compatible model selector in JSON payload (#16562)
|
3 kuukautta sitten |
sirus20x6
|
19a5a3edfd
ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_vec_set_f32 for faster fills (#16522)
|
3 kuukautta sitten |
Acly
|
d8eaa26e4d
tests : fix test-thread-safety when compiling with multiple backends (#16699)
|
3 kuukautta sitten |
Aman Gupta
|
9285325ce0
CUDA: fix bug in topk-moe softmax (#16711)
|
3 kuukautta sitten |
Aman Gupta
|
03792ad936
CUDA: topk-moe: add optional parameter for gpt-oss (#16649)
|
3 kuukautta sitten |
Johannes Gäßler
|
51d1a8c997
CUDA: better error for FA kernel with 0 occupancy (#16643)
|
3 kuukautta sitten |
Aman Gupta
|
4926419c4d
ggml: add ggml_can_fuse_subgraph (#16662)
|
3 kuukautta sitten |
lhez
|
6ea37f5739
opencl: fix warnings and clean up profiling (#16688)
|
3 kuukautta sitten |
Jeff Bolz
|
fb349848f3
vulkan: Handle FA with all -inf mask values (#16447)
|
3 kuukautta sitten |
YehuditE
|
6de8ed7519
sycl : add PAD_REFLECT_D1 operator support (#16145)
|
3 kuukautta sitten |
Sigbjørn Skjæret
|
84bf3c6778
model : add BailingMoeV2 support (#16063)
|
3 kuukautta sitten |
Aleksander Grygier
|
c9c1972e2c
Handle legacy 'context' attachments (#16687)
|
3 kuukautta sitten |
Diego Devesa
|
b617cfd289
ggml-alloc : fix leak when reusing a tensor with a larger size (#16679)
|
3 kuukautta sitten |