Łukasz Ślusarczyk
|
9c404ed54c
sycl: use oneDNN for matrices multiplication (#12972)
|
hace 8 meses |
Diego Devesa
|
6c8b91500e
llama-bench : fix -ot with dl backends (#13563)
|
hace 8 meses |
Xuan-Son Nguyen
|
3cc1f1f1d2
webui : handle PDF input (as text or image) + convert pasted long content to file (#13562)
|
hace 8 meses |
Piotr Wilkin (ilintar)
|
c753d7bed0
server : proper error handling for missing elements in messages array (OpenAI compatible backend) (#13540)
|
hace 8 meses |
Georgi Gerganov
|
b2838049cc
bench : handle decode errors (#13548)
|
hace 8 meses |
Olivier Chafik
|
aa48e373f2
`server`: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802)
|
hace 8 meses |
Georgi Gerganov
|
e3a9421b78
kv-cache : fix out-of-bounds view during reserve graph (#13547)
|
hace 8 meses |
Yibo Cai
|
5ab5d5fb25
arm64: optimize q6_k_q8_k kernel with i8mm (#13519)
|
hace 8 meses |
Olivier Chafik
|
3198405e98
`common`: add partial regex support (#12808)
|
hace 8 meses |
Sigbjørn Skjæret
|
f5170c1d7a
editorconfig : fix trailing whitespace from #13542 (#13546)
|
hace 8 meses |
Gilad S.
|
017f10b5fa
fix: crash when calling `llama_state_get_size` on a context without a KV cache (#13542)
|
hace 8 meses |
Johannes Gäßler
|
4696d56749
CUDA: fix crash on large batch size for quant. MoE (#13537)
|
hace 8 meses |
Diego Devesa
|
b7d2672082
llama : fix quantize with dl backends (#13539)
|
hace 8 meses |
Johannes Gäßler
|
6da34fa276
CUDA: faster Deepseek FA, add Turing support (#13435)
|
hace 8 meses |
Gabe Goodhart
|
5e7d95e22e
fix: Move build_inp_pos to the top of the graph section for build_granite (#13538)
|
hace 8 meses |
Georgi Gerganov
|
053174436f
server : passthrough the /models endpoint during loading (#13535)
|
hace 8 meses |
Xuan-Son Nguyen
|
360a9c98e1
server : fix cache_tokens bug with no cache_prompt (#13533)
|
hace 8 meses |
bandoti
|
09d13d94fb
cmake: simplify vulkan shader test logic (#13263)
|
hace 8 meses |
Jeff Bolz
|
24e86cae72
vulkan: KHR_coopmat flash attention (#13506)
|
hace 8 meses |
Xuan-Son Nguyen
|
bb1681fbd5
webui : use fflate for more deterministic gzip compress (#13525)
|
hace 8 meses |
Luca Stefani
|
d486dd3e8e
webui: Allow pasting file from clipboard (#13526)
|
hace 8 meses |
ddpasa
|
21ca987fba
docs: Update link to ggml-org in multimodal.md (#13513)
|
hace 8 meses |
Sigbjørn Skjæret
|
be1d4a13db
scripts : fix compare-llama-bench.py show parameter (#13514)
|
hace 8 meses |
Jeff Bolz
|
ab3971f2a0
vulkan: workaround FA compile failures on macos (#13517)
|
hace 8 meses |
Ed Addario
|
e5c834f718
quantize : improve tensor-type pattern matching (#13033)
|
hace 8 meses |
Xuan-Son Nguyen
|
71bdbdb587
clip : clip.h become private API (⚠️ breaking change) (#13510)
|
hace 8 meses |
Georgi Gerganov
|
f0995d28ce
metal : use FA-vec kernel up to batch size 20 (#13496)
|
hace 8 meses |
Georgi Gerganov
|
c252e0c409
metal : optimize multi-sequence FA vec kernel (#13493)
|
hace 8 meses |
Dan Johansson
|
4f711afed5
ggml-cpu: Update KleidiAI to v1.6 and fix include directives (#13509)
|
hace 8 meses |
Georgi Gerganov
|
b89d605a91
batched-bench : fix pp batch contents (#13492)
|
hace 8 meses |