Dan Johansson
|
a71a4075cd
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (#13053)
|
hai 8 meses |
Johannes Gäßler
|
95e18884fc
CUDA: fix misaligned synchronization in FA (#13469)
|
hai 8 meses |
Xuan-Son Nguyen
|
df8491922f
ggml : add mrope kernel for metal (#13457)
|
hai 8 meses |
Atharva Dubey
|
14492144c2
enable dpcpp nightly builds with libraries (#13406)
|
hai 8 meses |
City
|
c104023994
mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459)
|
hai 8 meses |
Anthony Umfer
|
9a390c4829
tools : fix uninitialized llama_batch in server (#13436)
|
hai 8 meses |
Sigbjørn Skjæret
|
09232370fc
scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451)
|
hai 8 meses |
Johannes Gäßler
|
7474e00b34
CUDA: fix crash with partial offloading of MoE (#13439)
|
hai 8 meses |
David Huang
|
7f323a589f
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)
|
hai 8 meses |
City
|
3eac209319
mtmd : support InternVL 3 38B and 78B mmproj (#13443)
|
hai 8 meses |
Xuan-Son Nguyen
|
a634d75d1b
mtmd : move helpers to dedicated file (#13442)
|
hai 8 meses |
Thomas Germer
|
62d4250e52
docs : Fix typo in InternVL3 model name (#13440)
|
hai 8 meses |
Johannes Gäßler
|
0208355f42
CUDA: fix race conditions FlashAttention kernels (#13438)
|
hai 8 meses |
Sigbjørn Skjæret
|
d2a4ef05c6
vocab : add ByteDance-Seed/Seed-Coder (#13423)
|
hai 8 meses |
Xuan-Son Nguyen
|
15e6125a39
mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434)
|
hai 8 meses |
Xuan-Son Nguyen
|
3b24d26c22
server : update docs (#13432)
|
hai 8 meses |
Sigbjørn Skjæret
|
43dfd741a5
llguidance : set tokenizer slices to default (#13424)
|
hai 8 meses |
Thammachart Chinvarapon
|
b064a51a4e
ci: free_disk_space flag enabled for intel variant (#13426)
|
hai 8 meses |
Xuan-Son Nguyen
|
053367d149
mtmd : support InternVL 2.5 and 3 (#13422)
|
hai 8 meses |
Johannes Gäßler
|
d8919424f1
CUDA: fix FlashAttention on Turing (#13415)
|
hai 8 meses |
Xuan-Son Nguyen
|
7fef11766c
arg : add env var to control mmproj (#13416)
|
hai 8 meses |
Jeff Bolz
|
dc1d2adfc0
vulkan: scalar flash attention implementation (#13324)
|
hai 8 meses |
Helton Reis
|
7c28a74e07
chore(llguidance): use tagged version that does not break the build (#13413)
|
hai 8 meses |
Xuan-Son Nguyen
|
33eff40240
server : vision support via libmtmd (#12898)
|
hai 8 meses |
Alberto Cabrera Pérez
|
17512a94d6
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858)
|
hai 8 meses |
Georgi Gerganov
|
611aa914ef
metal : optimize MoE for large batches (#13388)
|
hai 8 meses |
Johannes Gäßler
|
0cf6725e9f
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
hai 8 meses |
Diego Devesa
|
27ebfcacba
llama : do not crash if there is no CPU backend (#13395)
|
hai 8 meses |
Johannes Gäßler
|
5c86c9ed3e
CUDA: fix crash on large batch size for MoE models (#13384)
|
hai 8 meses |
Bartowski
|
efb8b47eda
imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389)
|
hai 8 meses |