Giuseppe Scrivano
|
e58d585604
model : add Granite Hybrid nano types (#16896)
|
2 bulan lalu |
Johannes Gäßler
|
31c511a968
CUDA: Volta tensor core support for MMF (#16843)
|
2 bulan lalu |
Georgi Gerganov
|
6d39015a74
sync : ggml
|
2 bulan lalu |
Aman Gupta
|
4146d6a1a6
CUDA: add expert reduce kernel (#16857)
|
2 bulan lalu |
Georgi Gerganov
|
8da3c0e200
batch : fix consistency checks for the input positions (#16890)
|
2 bulan lalu |
Georgi Gerganov
|
c22473b580
server : don't print user inputs to console (#16871)
|
2 bulan lalu |
Daniel Bevenius
|
0f715b4e75
server : fix typos in server.cpp comments [no ci] (#16883)
|
2 bulan lalu |
Jeff Bolz
|
d2d931f173
vulkan: disable spirv-opt for rope shaders (#16872)
|
2 bulan lalu |
Masato Nakasaka
|
2976b0374d
vulkan: Fix crash when FP16 mul_mat accumulation is not supported (#16796)
|
2 bulan lalu |
Ruben Ortlam
|
d2a2673dd1
vulkan: fix shmem overrun in mmq id shader (#16873)
|
2 bulan lalu |
l3utterfly
|
13002a0896
ggml-hexagon: respect input size when getting/setting tensor data (#16836)
|
2 bulan lalu |
Sigbjørn Skjæret
|
6eb208d17e
ci : enable free-disk-space on cuda docker build (#16877)
|
2 bulan lalu |
lhez
|
9984cbb61d
opencl: fix boundary handling for mul_mm (#16875)
|
2 bulan lalu |
RodriMora
|
ce18efeaf1
convert : update transformers requirements (#16866)
|
2 bulan lalu |
chansikpark
|
16724b5b68
server : bump request URI max length to 32768 (#16862)
|
2 bulan lalu |
Georgi Gerganov
|
b52edd2558
server : remove n_past (#16818)
|
2 bulan lalu |
Max Krasnyansky
|
517b7170e1
cpu: introduce chunking for repack matmuls and enable matmul-id chunking on ARM64 (#16833)
|
2 bulan lalu |
Shagun Bera
|
835e918d84
common: fix typo in cli help text (#16864)
|
2 bulan lalu |
JJJYmmm
|
d261223d24
model: add support for qwen3vl series (#16780)
|
2 bulan lalu |
Max Krasnyansky
|
dcca0d3ab8
cpu: introduce chunking for flash attention (#16829)
|
2 bulan lalu |
Tianyue-Zhao
|
bacddc049a
model: Add support for CogVLM model (#15002)
|
2 bulan lalu |
Sigbjørn Skjæret
|
229bf68628
cuda : fix argsort with 64k+ rows (#16849)
|
2 bulan lalu |
Jan Boon
|
d7395115ba
llama : use std::abs instead of abs (#16853)
|
2 bulan lalu |
Jeff Bolz
|
052df28b0e
vulkan: Handle argsort with a large number of rows (#16851)
|
2 bulan lalu |
Oliver Simons
|
8b11deea46
Hide latency of bias and gate-loading (#16847)
|
2 bulan lalu |
Jeff Bolz
|
b9ce940177
vulkan: Fuse rope+set_rows (#16769)
|
2 bulan lalu |
Xuan-Son Nguyen
|
3464bdac37
llama: fix ASAN error with M-RoPE (#16848)
|
2 bulan lalu |
Xuan-Son Nguyen
|
e3af5563bd
llama: store mrope data in KV cell (#16825)
|
2 bulan lalu |
Jeff Bolz
|
10fcc41290
vulkan: Update topk_moe fusion to handle gpt's late softmax (#16656)
|
2 bulan lalu |
Ruben Ortlam
|
bcf5bda6f5
Vulkan MMQ Integer Dot Refactor and K-Quant support (#16536)
|
2 bulan lalu |