cturan
|
54ed0123a6
Add minimax model support
|
2 months ago |
Johannes Gäßler
|
945501f5ea
llama: fix leaked buffers for mmap + split files (#16765)
|
2 months ago |
Aman Gupta
|
75cbdd3fce
test-backend-ops: print failed tests at the end (#16785)
|
2 months ago |
tamarPal
|
2b9bd9bf4e
sycl: add ROLL operation support (#16665)
|
2 months ago |
shani-f
|
59fc1ec8e8
sycl: add REPEAT_BACK operation support (#16734)
|
2 months ago |
Aman Gupta
|
75d33b9302
CUDA: support for weight clamp in top-k norm (#16702)
|
2 months ago |
Acly
|
3470a5c891
ggml-alloc : make gallocr prefer chunks that allow memory reuse (#16788)
|
2 months ago |
Sigbjørn Skjæret
|
bd562fe4f7
cuda : use fast copy when src and dst are of different type and contiguous (#16789)
|
2 months ago |
leejet
|
bbac6a26b2
ggml: fix cuda kernel launch configuration for k_compute_batched_ptrs to support large batch (#16744)
|
2 months ago |
Sigbjørn Skjæret
|
73a48c9790
convert : enable expert group selection for all models with it (#16691)
|
2 months ago |
Sigbjørn Skjæret
|
f696428ce8
graph : add clamping to ffn_moe_weights_sum to avoid div-by-zero (#16655)
|
2 months ago |
Sigbjørn Skjæret
|
7cce4f8158
model : set res->t_embd in SmallThinker models (#16782)
|
2 months ago |
amirai21
|
8d8862829c
docs : add Jamba to Text-only models list (#16778)
|
2 months ago |
Aman Gupta
|
f77c13b91f
CUDA: General GEMV fusion (#16715)
|
2 months ago |
Gilad S.
|
3cfa9c3f12
vulkan: deduplicate Microsoft Direct3D12 devices (#16689)
|
2 months ago |
Galunid
|
5d195f17bc
convert : handle mmproj filename/path properly (#16760)
|
2 months ago |
Shunta Saito
|
226f295f4d
model : set res->t_embd in PLaMo2 models (#16766)
|
2 months ago |
Giuseppe Scrivano
|
f90b4a8efe
vulkan: delete dead code (#16732)
|
2 months ago |
Jeff Bolz
|
8423d01931
vulkan: Optimize SSM_SCAN (#16645)
|
2 months ago |
compilade
|
5cca2542ac
convert : avoid dequantizing mxfp4 for GPT-OSS (#16756)
|
2 months ago |
leejet
|
55945d2ef5
ggml: fix CUDA grid launch condition for large block_nums.y in binbcast (#16742)
|
2 months ago |
Aman Gupta
|
0bcb40b48c
CUDA: use CUB for arbitary size argsort (#16754)
|
2 months ago |
Florian Badie
|
69e9ff0103
webui: support q URL parameter (#16728)
|
2 months ago |
Daniel Bevenius
|
5a91109a5d
model-conversion : add trust_remote_code for orig model run [no ci] (#16751)
|
2 months ago |
compilade
|
f8f071fadd
convert : handle pre-quantized models (#14810)
|
2 months ago |
Johannes Gäßler
|
0bf47a1dbb
server: add memory breakdown print (#16740)
|
2 months ago |
Julien Denize
|
dd62dcfab9
convert : Make mistral-common dependency optional (#16738)
|
2 months ago |
Xuan-Son Nguyen
|
d0660f237a
mtmd-cli : allow using --jinja (#16718)
|
2 months ago |
Prajwal B Mehendarkar
|
fe6a9882ac
Manually link -lbsd to resolve flock symbol on AIX (#16610)
|
2 months ago |
Aman Gupta
|
061f0eff02
ggml-cuda: use passed ops instead of hardcoded ops (#16712)
|
2 months ago |