amirai21
|
8d8862829c
docs : add Jamba to Text-only models list (#16778)
|
2 ay önce |
Aman Gupta
|
f77c13b91f
CUDA: General GEMV fusion (#16715)
|
2 ay önce |
Gilad S.
|
3cfa9c3f12
vulkan: deduplicate Microsoft Direct3D12 devices (#16689)
|
2 ay önce |
Galunid
|
5d195f17bc
convert : handle mmproj filename/path properly (#16760)
|
2 ay önce |
Shunta Saito
|
226f295f4d
model : set res->t_embd in PLaMo2 models (#16766)
|
2 ay önce |
Giuseppe Scrivano
|
f90b4a8efe
vulkan: delete dead code (#16732)
|
2 ay önce |
Jeff Bolz
|
8423d01931
vulkan: Optimize SSM_SCAN (#16645)
|
2 ay önce |
compilade
|
5cca2542ac
convert : avoid dequantizing mxfp4 for GPT-OSS (#16756)
|
2 ay önce |
leejet
|
55945d2ef5
ggml: fix CUDA grid launch condition for large block_nums.y in binbcast (#16742)
|
2 ay önce |
Aman Gupta
|
0bcb40b48c
CUDA: use CUB for arbitary size argsort (#16754)
|
2 ay önce |
Florian Badie
|
69e9ff0103
webui: support q URL parameter (#16728)
|
2 ay önce |
Daniel Bevenius
|
5a91109a5d
model-conversion : add trust_remote_code for orig model run [no ci] (#16751)
|
2 ay önce |
compilade
|
f8f071fadd
convert : handle pre-quantized models (#14810)
|
2 ay önce |
Johannes Gäßler
|
0bf47a1dbb
server: add memory breakdown print (#16740)
|
2 ay önce |
Julien Denize
|
dd62dcfab9
convert : Make mistral-common dependency optional (#16738)
|
2 ay önce |
Xuan-Son Nguyen
|
d0660f237a
mtmd-cli : allow using --jinja (#16718)
|
2 ay önce |
Prajwal B Mehendarkar
|
fe6a9882ac
Manually link -lbsd to resolve flock symbol on AIX (#16610)
|
2 ay önce |
Aman Gupta
|
061f0eff02
ggml-cuda: use passed ops instead of hardcoded ops (#16712)
|
2 ay önce |
matteo
|
8cf6b42d46
server : send partial stop string when <EOG> is reached (#15007)
|
2 ay önce |
Matthew Michel
|
9de9672adb
sycl: use async memory allocation to fix crashes during graph recording (#16644)
|
2 ay önce |
Max Krasnyansky
|
63d2fc46e1
Add experimental ggml-hexagon backend for the Hexagon NPU (#16547)
|
2 ay önce |
Diego Devesa
|
a2e0088d92
Revert "ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_v…" (#16723)
|
2 ay önce |
Pascal
|
9b9201f65a
webui: introduce OpenAI-compatible model selector in JSON payload (#16562)
|
2 ay önce |
sirus20x6
|
19a5a3edfd
ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_vec_set_f32 for faster fills (#16522)
|
2 ay önce |
Acly
|
d8eaa26e4d
tests : fix test-thread-safety when compiling with multiple backends (#16699)
|
2 ay önce |
Aman Gupta
|
9285325ce0
CUDA: fix bug in topk-moe softmax (#16711)
|
2 ay önce |
Aman Gupta
|
03792ad936
CUDA: topk-moe: add optional parameter for gpt-oss (#16649)
|
3 ay önce |
Johannes Gäßler
|
51d1a8c997
CUDA: better error for FA kernel with 0 occupancy (#16643)
|
3 ay önce |
Aman Gupta
|
4926419c4d
ggml: add ggml_can_fuse_subgraph (#16662)
|
3 ay önce |
lhez
|
6ea37f5739
opencl: fix warnings and clean up profiling (#16688)
|
3 ay önce |