Yibo Cai
|
5ab5d5fb25
arm64: optimize q6_k_q8_k kernel with i8mm (#13519)
|
hai 8 meses |
Olivier Chafik
|
3198405e98
`common`: add partial regex support (#12808)
|
hai 8 meses |
Sigbjørn Skjæret
|
f5170c1d7a
editorconfig : fix trailing whitespace from #13542 (#13546)
|
hai 8 meses |
Gilad S.
|
017f10b5fa
fix: crash when calling `llama_state_get_size` on a context without a KV cache (#13542)
|
hai 8 meses |
Johannes Gäßler
|
4696d56749
CUDA: fix crash on large batch size for quant. MoE (#13537)
|
hai 8 meses |
Diego Devesa
|
b7d2672082
llama : fix quantize with dl backends (#13539)
|
hai 8 meses |
Johannes Gäßler
|
6da34fa276
CUDA: faster Deepseek FA, add Turing support (#13435)
|
hai 8 meses |
Gabe Goodhart
|
5e7d95e22e
fix: Move build_inp_pos to the top of the graph section for build_granite (#13538)
|
hai 8 meses |
Georgi Gerganov
|
053174436f
server : passthrough the /models endpoint during loading (#13535)
|
hai 8 meses |
Xuan-Son Nguyen
|
360a9c98e1
server : fix cache_tokens bug with no cache_prompt (#13533)
|
hai 8 meses |
bandoti
|
09d13d94fb
cmake: simplify vulkan shader test logic (#13263)
|
hai 8 meses |
Jeff Bolz
|
24e86cae72
vulkan: KHR_coopmat flash attention (#13506)
|
hai 8 meses |
Xuan-Son Nguyen
|
bb1681fbd5
webui : use fflate for more deterministic gzip compress (#13525)
|
hai 8 meses |
Luca Stefani
|
d486dd3e8e
webui: Allow pasting file from clipboard (#13526)
|
hai 8 meses |
ddpasa
|
21ca987fba
docs: Update link to ggml-org in multimodal.md (#13513)
|
hai 8 meses |
Sigbjørn Skjæret
|
be1d4a13db
scripts : fix compare-llama-bench.py show parameter (#13514)
|
hai 8 meses |
Jeff Bolz
|
ab3971f2a0
vulkan: workaround FA compile failures on macos (#13517)
|
hai 8 meses |
Ed Addario
|
e5c834f718
quantize : improve tensor-type pattern matching (#13033)
|
hai 8 meses |
Xuan-Son Nguyen
|
71bdbdb587
clip : clip.h become private API (⚠️ breaking change) (#13510)
|
hai 8 meses |
Georgi Gerganov
|
f0995d28ce
metal : use FA-vec kernel up to batch size 20 (#13496)
|
hai 8 meses |
Georgi Gerganov
|
c252e0c409
metal : optimize multi-sequence FA vec kernel (#13493)
|
hai 8 meses |
Dan Johansson
|
4f711afed5
ggml-cpu: Update KleidiAI to v1.6 and fix include directives (#13509)
|
hai 8 meses |
Georgi Gerganov
|
b89d605a91
batched-bench : fix pp batch contents (#13492)
|
hai 8 meses |
Xuan-Son Nguyen
|
b4726345ac
mtmd : remove libllava, remove clip-quantize-cli (⚠️ breaking change) (#13460)
|
hai 8 meses |
Sigbjørn Skjæret
|
bf79371120
scripts : support arbitrary input file formats in compare-llama-bench.py (#13455)
|
hai 8 meses |
Gabe Goodhart
|
d590cd4c24
model : Granite MoE shared (#13269)
|
hai 8 meses |
Georgi Gerganov
|
1e2809bc4b
sync : ggml
|
hai 8 meses |
Diego Devesa
|
cf0a43bb64
llama-bench : add defrag-thold, check for invalid ranges (#13487)
|
hai 8 meses |
lhez
|
f0d46ef157
opencl: remove unnecessary assert for `add` (#13257)
|
hai 8 meses |
Xuan-Son Nguyen
|
de4c07f937
clip : cap max image size 1024 for qwen vl model (#13478)
|
hai 8 meses |