lhez
|
f0d46ef157
opencl: remove unnecessary assert for `add` (#13257)
|
8 meses atrás |
Xuan-Son Nguyen
|
de4c07f937
clip : cap max image size 1024 for qwen vl model (#13478)
|
8 meses atrás |
Johannes Gäßler
|
10d2af0eaa
llama/ggml: add LLM training support (#10544)
|
8 meses atrás |
Georgi Gerganov
|
064cc596ac
context : fix state io for memory-less contexts (#13470)
|
8 meses atrás |
Anudit Nagar
|
91159ee9df
server : allow content to be null in oaicompat_completion_params_parse (#13477)
|
8 meses atrás |
Diego Devesa
|
22cdab343b
llama-bench : accept ranges for integer parameters (#13410)
|
8 meses atrás |
Dan Johansson
|
a71a4075cd
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (#13053)
|
8 meses atrás |
Johannes Gäßler
|
95e18884fc
CUDA: fix misaligned synchronization in FA (#13469)
|
8 meses atrás |
Xuan-Son Nguyen
|
df8491922f
ggml : add mrope kernel for metal (#13457)
|
8 meses atrás |
Atharva Dubey
|
14492144c2
enable dpcpp nightly builds with libraries (#13406)
|
8 meses atrás |
City
|
c104023994
mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459)
|
8 meses atrás |
Anthony Umfer
|
9a390c4829
tools : fix uninitialized llama_batch in server (#13436)
|
8 meses atrás |
Sigbjørn Skjæret
|
09232370fc
scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451)
|
8 meses atrás |
Johannes Gäßler
|
7474e00b34
CUDA: fix crash with partial offloading of MoE (#13439)
|
8 meses atrás |
David Huang
|
7f323a589f
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)
|
8 meses atrás |
City
|
3eac209319
mtmd : support InternVL 3 38B and 78B mmproj (#13443)
|
8 meses atrás |
Xuan-Son Nguyen
|
a634d75d1b
mtmd : move helpers to dedicated file (#13442)
|
8 meses atrás |
Thomas Germer
|
62d4250e52
docs : Fix typo in InternVL3 model name (#13440)
|
8 meses atrás |
Johannes Gäßler
|
0208355f42
CUDA: fix race conditions FlashAttention kernels (#13438)
|
8 meses atrás |
Sigbjørn Skjæret
|
d2a4ef05c6
vocab : add ByteDance-Seed/Seed-Coder (#13423)
|
8 meses atrás |
Xuan-Son Nguyen
|
15e6125a39
mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434)
|
8 meses atrás |
Xuan-Son Nguyen
|
3b24d26c22
server : update docs (#13432)
|
8 meses atrás |
Sigbjørn Skjæret
|
43dfd741a5
llguidance : set tokenizer slices to default (#13424)
|
8 meses atrás |
Thammachart Chinvarapon
|
b064a51a4e
ci: free_disk_space flag enabled for intel variant (#13426)
|
8 meses atrás |
Xuan-Son Nguyen
|
053367d149
mtmd : support InternVL 2.5 and 3 (#13422)
|
8 meses atrás |
Johannes Gäßler
|
d8919424f1
CUDA: fix FlashAttention on Turing (#13415)
|
8 meses atrás |
Xuan-Son Nguyen
|
7fef11766c
arg : add env var to control mmproj (#13416)
|
8 meses atrás |
Jeff Bolz
|
dc1d2adfc0
vulkan: scalar flash attention implementation (#13324)
|
8 meses atrás |
Helton Reis
|
7c28a74e07
chore(llguidance): use tagged version that does not break the build (#13413)
|
8 meses atrás |
Xuan-Son Nguyen
|
33eff40240
server : vision support via libmtmd (#12898)
|
8 meses atrás |