Jared Van Bortel
|
a70183eb00
llama-model : fix the reported size class for nomic-embed-text-v2-moe (#13223)
|
8 months ago |
Georgi Gerganov
|
8d33d740c3
sync : ggml
|
8 months ago |
Diego Devesa
|
4254bb4951
ggml : fix ggml_gallocr_ptr type (ggml/1205)
|
8 months ago |
Georgi Gerganov
|
9998540149
cuda : fix unused variable compile warning (whisper/0)
|
9 months ago |
Johannes Gäßler
|
e1e8e0991f
CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199)
|
8 months ago |
Xuan-Son Nguyen
|
6f67cf1f48
arg : -hf do not fail if url mismatch (#13219)
|
8 months ago |
ddh0
|
16a457facd
fix typo: `n_ctx_pre_seq` -> `n_ctx_per_seq` (#13221)
|
8 months ago |
Xuan-Son Nguyen
|
3e168bede4
convert : improve model arch handling (#13122)
|
8 months ago |
Tatsuya Tanaka
|
ceda28ef8e
llava : remove duplicate include (#13207)
|
8 months ago |
Olivier Chafik
|
3b127c7385
common : add -jf / --json-schema-file flag (#12011)
|
8 months ago |
Jeff Bolz
|
e5007a5edf
vulkan: use uint array index to avoid glslang bug (#13193)
|
8 months ago |
shalinib-ibm
|
416313773b
ggml : fix ppc64le build (#13176)
|
8 months ago |
Xuan-Son Nguyen
|
07c2e2f76c
convert : correct typo image_mean --> image_std (#13208)
|
8 months ago |
Aaron Teo
|
44cd8d91ff
feat(ggml-cpu): enable z17 compile (#13182)
|
8 months ago |
Xuan-Son Nguyen
|
5933e6fdc9
arg : allow using -hf offline (#13202)
|
8 months ago |
Xuan-Son Nguyen
|
da84c04d8f
docker : do not build tests (#13204)
|
8 months ago |
xiaofei
|
a0f7016d17
rpc : fix cache directory initialization (#13188)
|
8 months ago |
Johannes Gäßler
|
19e899ce21
scripts: n_depth for compare-llama-bench [no ci] (#13201)
|
8 months ago |
matteo
|
e2e1ddb93a
server : Prefilling assistant message in openai compatible API (#13174)
|
8 months ago |
Georgi Gerganov
|
d9d398f84f
sampling : when top-k <= 0 -> noop (#13173)
|
8 months ago |
Alberto Cabrera Pérez
|
5a63980117
llama-bench: fixed size of fields to correctly map to values (#13183)
|
8 months ago |
Johannes Gäßler
|
cdf76586b2
CUDA: fix non-cont. inputs for batched mat mul (#13155)
|
8 months ago |
Sigbjørn Skjæret
|
7d3af70b08
llama : llm_type order by size (#13177)
|
8 months ago |
Xuan-Son Nguyen
|
00e3e5a194
mtmd : add qwen2vl and qwen2.5vl (#13141)
|
8 months ago |
Sigbjørn Skjæret
|
e98b3692be
llama : set qwen3 model type sizes (#13175)
|
8 months ago |
Xuan-Son Nguyen
|
b6ce7430b7
llama-graph : fix text position for mrope (#13159)
|
8 months ago |
AT
|
5f5e39e1ba
model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466)
|
8 months ago |
Xuan-Son Nguyen
|
eaea325324
clip : fix model size display (#13153)
|
8 months ago |
Ville Vesilehto
|
43ddab6eee
fix(rpc): Improve input validation and error handling (#13069)
|
8 months ago |
Vishal Agarwal
|
1831f538f7
llama-bench: add `-d` depth arg (#13096)
|
8 months ago |