Diego Devesa
|
415e40a357
releases : use arm version of curl for arm releases (#13592)
|
8 달 전 |
Georgi Gerganov
|
654a67794f
metal : add FA-vec kernel for head size 64 (#13583)
|
8 달 전 |
Diego Devesa
|
5364ae4ba5
llama : print hint when loading a model when no backends are loaded (#13589)
|
8 달 전 |
Sigbjørn Skjæret
|
7c07ac244d
ci : add ppc64el to build-linux-cross (#13575)
|
8 달 전 |
Łukasz Ślusarczyk
|
0a338ed013
sycl : fixed compilation warnings (#13582)
|
8 달 전 |
Olivier Chafik
|
bc098c3cf0
minja: sync (qwen3) (#13573)
|
8 달 전 |
Diego Devesa
|
c6a2c9e741
gguf : use ggml log system (#13571)
|
8 달 전 |
Daniel Tang
|
07ad2b6db3
gguf-py : fix disconnect-before-connect in editor-gui (#13569)
|
8 달 전 |
Xuan-Son Nguyen
|
c531edfa34
convert : fix conversion for llama 4 (#13567)
|
8 달 전 |
Atharva Dubey
|
02cdd2d8b0
sycl: simplify bin_bcast_kernel (#13383)
|
8 달 전 |
Svetlozar Georgiev
|
64bb51cf90
sycl: reordered Q4_K MMVQ (#13109)
|
8 달 전 |
Łukasz Ślusarczyk
|
9c404ed54c
sycl: use oneDNN for matrices multiplication (#12972)
|
8 달 전 |
Diego Devesa
|
6c8b91500e
llama-bench : fix -ot with dl backends (#13563)
|
8 달 전 |
Xuan-Son Nguyen
|
3cc1f1f1d2
webui : handle PDF input (as text or image) + convert pasted long content to file (#13562)
|
8 달 전 |
Piotr Wilkin (ilintar)
|
c753d7bed0
server : proper error handling for missing elements in messages array (OpenAI compatible backend) (#13540)
|
8 달 전 |
Georgi Gerganov
|
b2838049cc
bench : handle decode errors (#13548)
|
8 달 전 |
Olivier Chafik
|
aa48e373f2
`server`: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802)
|
8 달 전 |
Georgi Gerganov
|
e3a9421b78
kv-cache : fix out-of-bounds view during reserve graph (#13547)
|
8 달 전 |
Yibo Cai
|
5ab5d5fb25
arm64: optimize q6_k_q8_k kernel with i8mm (#13519)
|
8 달 전 |
Olivier Chafik
|
3198405e98
`common`: add partial regex support (#12808)
|
8 달 전 |
Sigbjørn Skjæret
|
f5170c1d7a
editorconfig : fix trailing whitespace from #13542 (#13546)
|
8 달 전 |
Gilad S.
|
017f10b5fa
fix: crash when calling `llama_state_get_size` on a context without a KV cache (#13542)
|
8 달 전 |
Johannes Gäßler
|
4696d56749
CUDA: fix crash on large batch size for quant. MoE (#13537)
|
8 달 전 |
Diego Devesa
|
b7d2672082
llama : fix quantize with dl backends (#13539)
|
8 달 전 |
Johannes Gäßler
|
6da34fa276
CUDA: faster Deepseek FA, add Turing support (#13435)
|
8 달 전 |
Gabe Goodhart
|
5e7d95e22e
fix: Move build_inp_pos to the top of the graph section for build_granite (#13538)
|
8 달 전 |
Georgi Gerganov
|
053174436f
server : passthrough the /models endpoint during loading (#13535)
|
8 달 전 |
Xuan-Son Nguyen
|
360a9c98e1
server : fix cache_tokens bug with no cache_prompt (#13533)
|
8 달 전 |
bandoti
|
09d13d94fb
cmake: simplify vulkan shader test logic (#13263)
|
8 달 전 |
Jeff Bolz
|
24e86cae72
vulkan: KHR_coopmat flash attention (#13506)
|
8 달 전 |