Georgi Gerganov
|
b8595b16e6
mtmd : fix embedding size for image input (#17123)
|
2 months ago |
Ruben Ortlam
|
392e09a608
vulkan: fix memory allocations (#17122)
|
2 months ago |
compilade
|
802cef44bf
convert : parse safetensors directly (#15667)
|
2 months ago |
compilade
|
1c07c0c68c
convert : handle compressed-tensors quant method (#17069)
|
2 months ago |
Georgi Gerganov
|
cb1adf8851
server : handle failures to restore host cache (#17078)
|
2 months ago |
Georgi Gerganov
|
ef1d826997
benches : add folder with benchmarks (#16931)
|
2 months ago |
Eric Curtin
|
86fde91e62
Switch to using Ubuntu 25.10 vulkan/mesa (#16497)
|
2 months ago |
Ruben Ortlam
|
7f3e9d339c
vulkan: iGPU memory reporting fix (#17110)
|
2 months ago |
Ruben Ortlam
|
8a3519b708
vulkan: fix mmq out of bounds reads (#17108)
|
2 months ago |
Jeff Bolz
|
80a6cf6347
vulkan: fuse mul_mat_id + mul (#17095)
|
2 months ago |
Georgi Gerganov
|
0750a59903
metal : retain src and dst buffers during async ops (#17101)
|
2 months ago |
Xuan-Son Nguyen
|
aa3b7a90b4
arg: add --cache-list argument to list cached models (#17073)
|
2 months ago |
chansikpark
|
333f2595a3
webui: fix keyboard shortcuts for new chat & edit chat title (#17007)
|
2 months ago |
Jeff Bolz
|
53d7d21e61
vulkan: Use spec constants for conv2d s/d/p and kernel W/H (#16978)
|
2 months ago |
Aidan
|
eeee367de5
server: fix correct time_ms calculation in prompt_progress (#17093)
|
2 months ago |
Aman Gupta
|
64fe17fbb8
Revert "CUDA: add expert reduce kernel (#16857)" (#17100)
|
2 months ago |
Aman Gupta
|
c1b187688d
CUDA: skip fusion for repeating adds in bias (#17080)
|
2 months ago |
SavicStefan
|
b8a5cfd11a
vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp (#16636)
|
2 months ago |
Aleksei Nikiforov
|
08416ebe7f
ggml: disable vxe for cross-compilation by default (#16966)
|
2 months ago |
Jeff Bolz
|
b4e335d8dc
vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (#16977)
|
2 months ago |
Jeff Bolz
|
d6fe40fa00
vulkan: Fix test-thread-safety crashes (#17024)
|
2 months ago |
Johannes Gäßler
|
e14e842e87
CUDA: fix MMQ stream-k fixup ne1 indices (#17089)
|
2 months ago |
Reese Levine
|
647b960bd8
ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031)
|
2 months ago |
bssrdf
|
299f5d782c
CUDA: properly handle nb00=nb02 case for cpy (#17081)
|
2 months ago |
Acly
|
ac76d36201
vulkan : refactor buffer handling in vk_op_f32 (#16840)
|
2 months ago |
Johannes Gäßler
|
6515610506
CUDA: fix should_use_mmvf for ne11 == 1 (#17085)
|
2 months ago |
Georgi Gerganov
|
7956bb4d7f
bench : cache the llama_context state at computed depth (#16944)
|
2 months ago |
Sigbjørn Skjæret
|
9008027aa3
hparams : add n_embd_inp() to support extended embed (#16928)
|
2 months ago |
Georgi Gerganov
|
16bcc1259d
kv-cache : pad the cache size to 256 for performance (#17046)
|
2 months ago |
Adrien Gallouët
|
9eb9a1331d
Revert "ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239)" (#17084)
|
2 months ago |