Ruben Ortlam
|
635ef78ec5
vulkan: work around Intel fp16 bug in mmq (#18814)
|
2 weeks ago |
Perry Naseck
|
7d587e5544
ggml-metal: do not copy headers for embedded, use current binary dir for embedded (#18705)
|
2 weeks ago |
Daniel Benjaminsson
|
d34aa07193
mmap: add Haiku support by skipping RLIMIT_MEMLOCK check (#18819)
|
2 weeks ago |
Adrien Gallouët
|
f709c7a33f
ci, tests : use cmake to download models and remove libcurl dependency (#18791)
|
2 weeks ago |
ddh0
|
6e36299b47
llama : print_info alignment fix (#18708)
|
2 weeks ago |
Junwon Hwang
|
60591f01d4
model : add EXAONE MoE (#18543)
|
2 weeks ago |
Georgi Gerganov
|
e4832e3ae4
vocab : fix attribute overrides for harmony (#18806)
|
2 weeks ago |
Ruben Ortlam
|
960e5e3b46
llama-mmap: fix direct-io loading fallback EOF exception (#18801)
|
2 weeks ago |
Daniel Bevenius
|
20ca2e12c4
model-conversion : remove -c 0 from model card template [no ci] (#18807)
|
2 weeks ago |
yulo
|
ea4a321f2a
HIP: add fattn-mma-f16 for RDNA4 (#18481)
|
2 weeks ago |
Johannes Gäßler
|
c1e79e610f
doc: ban AI-generated PR descriptions [no ci] (#18765)
|
2 weeks ago |
Xuan-Son Nguyen
|
e047f9ee9d
mtmd: fix use_non_causal being reported incorrectly (#18793)
|
2 weeks ago |
Georgi Gerganov
|
0a57271ab6
CUDA : fix unused argument when USE_CUDA_GRAPH=OFF (#18800)
|
2 weeks ago |
Gabe Goodhart
|
076b0faf7d
graph : clean up t5 input builders (#18795)
|
2 weeks ago |
Ruben Ortlam
|
db79dc06b1
llama-bench: add direct_io parameter (#18778)
|
2 weeks ago |
Adrien Gallouët
|
537d4240d4
ci : remove libcurl in releases (#18775)
|
2 weeks ago |
Radoslav Gerganov
|
bcf7546160
server : add arg for disabling prompt caching (#18776)
|
2 weeks ago |
Adrien Gallouët
|
36c5913c45
ci : use openssl for openEuler-latest-cmake-cann (#18779)
|
2 weeks ago |
Adrien Gallouët
|
8e649571cd
vendor : update cpp-httplib to 0.30.1 (#18771)
|
2 weeks ago |
Daniel Bevenius
|
4150da9a95
examples : add --kv-unified to batched example (#18774)
|
2 weeks ago |
Jeff Bolz
|
8e2da778da
vulkan: change memory_logger to be controlled by an env var (#18769)
|
2 weeks ago |
Xuan-Son Nguyen
|
ce3bf9b1a4
server: update docs for sleeping [no ci] (#18777)
|
2 weeks ago |
Jeff Bolz
|
2bbe4c2cf8
vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id) (#18678)
|
2 weeks ago |
Ruben Ortlam
|
1051ecd289
vulkan: Disable large coopmat matmul configuration on proprietary AMD driver (#18763)
|
2 weeks ago |
Xuan-Son Nguyen
|
0c3b7a9efe
model: fix qwen3next broken due to #18683 (#18762)
|
2 weeks ago |
Ruben Ortlam
|
0e76501e1d
Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support (#18749)
|
2 weeks ago |
Xuan-Son Nguyen
|
4b060bf240
security: make it clear about subtopics in server (#18754)
|
2 weeks ago |
Daniel Bevenius
|
9789e28459
debug : include LLAMA_POOLING_TYPE_UNSPECIFIED in pooling check (#18692)
|
2 weeks ago |
Georgi Gerganov
|
84ae04f163
tests : refactor test-backend-sampler (#18753)
|
2 weeks ago |
Xuan-Son Nguyen
|
506bb6e010
model: try to improve Qwen3 Next (#18683)
|
2 weeks ago |