Sigbjørn Skjæret
|
bf79371120
scripts : support arbitrary input file formats in compare-llama-bench.py (#13455)
|
8 months ago |
Gabe Goodhart
|
d590cd4c24
model : Granite MoE shared (#13269)
|
8 months ago |
Georgi Gerganov
|
1e2809bc4b
sync : ggml
|
8 months ago |
Diego Devesa
|
cf0a43bb64
llama-bench : add defrag-thold, check for invalid ranges (#13487)
|
8 months ago |
lhez
|
f0d46ef157
opencl: remove unnecessary assert for `add` (#13257)
|
8 months ago |
Xuan-Son Nguyen
|
de4c07f937
clip : cap max image size 1024 for qwen vl model (#13478)
|
8 months ago |
Johannes Gäßler
|
10d2af0eaa
llama/ggml: add LLM training support (#10544)
|
8 months ago |
Georgi Gerganov
|
064cc596ac
context : fix state io for memory-less contexts (#13470)
|
8 months ago |
Anudit Nagar
|
91159ee9df
server : allow content to be null in oaicompat_completion_params_parse (#13477)
|
8 months ago |
Diego Devesa
|
22cdab343b
llama-bench : accept ranges for integer parameters (#13410)
|
8 months ago |
Dan Johansson
|
a71a4075cd
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (#13053)
|
8 months ago |
Johannes Gäßler
|
95e18884fc
CUDA: fix misaligned synchronization in FA (#13469)
|
8 months ago |
Xuan-Son Nguyen
|
df8491922f
ggml : add mrope kernel for metal (#13457)
|
8 months ago |
Atharva Dubey
|
14492144c2
enable dpcpp nightly builds with libraries (#13406)
|
8 months ago |
City
|
c104023994
mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459)
|
8 months ago |
Anthony Umfer
|
9a390c4829
tools : fix uninitialized llama_batch in server (#13436)
|
8 months ago |
Sigbjørn Skjæret
|
09232370fc
scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451)
|
8 months ago |
Johannes Gäßler
|
7474e00b34
CUDA: fix crash with partial offloading of MoE (#13439)
|
8 months ago |
David Huang
|
7f323a589f
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)
|
8 months ago |
City
|
3eac209319
mtmd : support InternVL 3 38B and 78B mmproj (#13443)
|
8 months ago |
Xuan-Son Nguyen
|
a634d75d1b
mtmd : move helpers to dedicated file (#13442)
|
8 months ago |
Thomas Germer
|
62d4250e52
docs : Fix typo in InternVL3 model name (#13440)
|
8 months ago |
Johannes Gäßler
|
0208355f42
CUDA: fix race conditions FlashAttention kernels (#13438)
|
8 months ago |
Sigbjørn Skjæret
|
d2a4ef05c6
vocab : add ByteDance-Seed/Seed-Coder (#13423)
|
8 months ago |
Xuan-Son Nguyen
|
15e6125a39
mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434)
|
8 months ago |
Xuan-Son Nguyen
|
3b24d26c22
server : update docs (#13432)
|
8 months ago |
Sigbjørn Skjæret
|
43dfd741a5
llguidance : set tokenizer slices to default (#13424)
|
8 months ago |
Thammachart Chinvarapon
|
b064a51a4e
ci: free_disk_space flag enabled for intel variant (#13426)
|
8 months ago |
Xuan-Son Nguyen
|
053367d149
mtmd : support InternVL 2.5 and 3 (#13422)
|
8 months ago |
Johannes Gäßler
|
d8919424f1
CUDA: fix FlashAttention on Turing (#13415)
|
8 months ago |