RodriMora
|
ce18efeaf1
convert : update transformers requirements (#16866)
|
2 月之前 |
chansikpark
|
16724b5b68
server : bump request URI max length to 32768 (#16862)
|
2 月之前 |
Georgi Gerganov
|
b52edd2558
server : remove n_past (#16818)
|
2 月之前 |
Max Krasnyansky
|
517b7170e1
cpu: introduce chunking for repack matmuls and enable matmul-id chunking on ARM64 (#16833)
|
2 月之前 |
Shagun Bera
|
835e918d84
common: fix typo in cli help text (#16864)
|
2 月之前 |
JJJYmmm
|
d261223d24
model: add support for qwen3vl series (#16780)
|
2 月之前 |
Max Krasnyansky
|
dcca0d3ab8
cpu: introduce chunking for flash attention (#16829)
|
2 月之前 |
Tianyue-Zhao
|
bacddc049a
model: Add support for CogVLM model (#15002)
|
2 月之前 |
Sigbjørn Skjæret
|
229bf68628
cuda : fix argsort with 64k+ rows (#16849)
|
2 月之前 |
Jan Boon
|
d7395115ba
llama : use std::abs instead of abs (#16853)
|
2 月之前 |
Jeff Bolz
|
052df28b0e
vulkan: Handle argsort with a large number of rows (#16851)
|
2 月之前 |
Oliver Simons
|
8b11deea46
Hide latency of bias and gate-loading (#16847)
|
2 月之前 |
Jeff Bolz
|
b9ce940177
vulkan: Fuse rope+set_rows (#16769)
|
2 月之前 |
Xuan-Son Nguyen
|
3464bdac37
llama: fix ASAN error with M-RoPE (#16848)
|
2 月之前 |
Xuan-Son Nguyen
|
e3af5563bd
llama: store mrope data in KV cell (#16825)
|
2 月之前 |
Jeff Bolz
|
10fcc41290
vulkan: Update topk_moe fusion to handle gpt's late softmax (#16656)
|
2 月之前 |
Ruben Ortlam
|
bcf5bda6f5
Vulkan MMQ Integer Dot Refactor and K-Quant support (#16536)
|
2 月之前 |
Max Krasnyansky
|
3eb2be1ca5
Hexagon Op queue & dispatch optimizations (#16820)
|
2 月之前 |
Aman Gupta
|
e41bcce8f0
CUDA: use fastdiv in set-rows (#16834)
|
2 月之前 |
Sigbjørn Skjæret
|
144a4ce824
vendor : sync minja (#16500)
|
2 月之前 |
Jeff Bolz
|
f549b0007d
vulkan: Call ggml_vk_buffer_write_2d from ggml_vk_buffer_copy (#16793)
|
2 月之前 |
Aman Gupta
|
9a3ea685b9
CUDA: Fix bug in topk-moe for gpt-oss (#16821)
|
2 月之前 |
YaelLogic
|
338074c383
sycl: add RMS_NORM_BACK operation support (#16808)
|
2 月之前 |
YaelGitAccount
|
851553ea6b
cuda: add SET operation support (#16804)
|
2 月之前 |
Georgi Gerganov
|
85a7d8677b
memory : remove KV cache size padding (#16812)
|
2 月之前 |
Georgi Gerganov
|
a8ca18b4b8
llama-bench : clarify benchmarked parts of the computation (#16823)
|
2 月之前 |
l3utterfly
|
8284efc35c
initialise buffer.device in ggml_hexagon_session (#16816)
|
2 月之前 |
Sam Malayek
|
1c1409e131
embedding: add raw option for --embd-output-format (#16541)
|
2 月之前 |
Johannes Gäßler
|
7a0e900e36
llama: consistent ctx <-> buf order for KV cache (#16746)
|
2 月之前 |
Aldehir Rojas
|
280d97be96
grammar : support array references in json schema (#16792)
|
2 月之前 |