Georgi Gerganov
|
b44890df2e
model : disable SWA for Phi models (#13676)
|
hai 8 meses |
R0CKSTAR
|
33983057d0
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647)
|
hai 8 meses |
Eve
|
fb1cab201c
vulkan: fix warnings (#13626)
|
hai 8 meses |
l3utterfly
|
b7a17463ec
mtmd-helper : bug fix to token batching in mtmd (#13650)
|
hai 8 meses |
Georgi Gerganov
|
be0239693c
model : fix llama4 graph (#13663)
|
hai 8 meses |
Georgi Gerganov
|
a4090d1174
llama : remove llama_kv_cache_view API + remove deprecated (#13653)
|
hai 8 meses |
Johannes Gäßler
|
b69f1647f9
CUDA: skip fully masked-out KV in FA vec kernel (#13584)
|
hai 8 meses |
Sigbjørn Skjæret
|
759e37b0d8
tests : avoid github urls due to throttling (#13654)
|
hai 8 meses |
Svetlozar Georgiev
|
4245e622e0
sycl: disable reorder for sycl mulmat (#13536)
|
hai 8 meses |
0cc4m
|
c9c64dee57
Set GLM4 blk.*.attn_output.weight, kqv_out-* matmul to GGML_PREC_F32 to fix infinity values in output (#13639)
|
hai 8 meses |
Georgi Gerganov
|
c00a2634be
metal : fix typo in FA kernel comments (#13651)
|
hai 8 meses |
Georgi Gerganov
|
e298d2fbd0
kv-cache : add SWA support (#13194)
|
hai 8 meses |
Xinpeng Dou
|
f0adb80bf7
CANN: Update CANN model support (#13162)
|
hai 8 meses |
Nicolò Scipione
|
f7c9429c85
sycl : Overcoming workaround for mmap() allocation on Windows (#13482)
|
hai 8 meses |
psocolovsky
|
1dfbf2cf3a
common : add load_progress_callback (#13617)
|
hai 8 meses |
0cc4m
|
8960efd0a6
Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence (#13607)
|
hai 8 meses |
Alberto Cabrera Pérez
|
725f23f1f3
sycl : backend documentation review (#13544)
|
hai 8 meses |
Xuan-Son Nguyen
|
92ecdcc06a
mtmd : add vision support for llama 4 (#13282)
|
hai 8 meses |
Alberto Cabrera Pérez
|
f71f40a284
ci : upgraded oneAPI version in SYCL workflows and dockerfile (#13532)
|
hai 8 meses |
Georgi Gerganov
|
d30cb5a7fa
sync : ggml
|
hai 8 meses |
Johannes Gäßler
|
6c35981a64
mnist: fix segmentation fault (ggml/1227)
|
hai 8 meses |
Diego Devesa
|
8b5e19aea6
ggml : fix apple OS check in ggml_print_backtrace (ggml/1229)
|
hai 8 meses |
Daniel Tang
|
60aea028b5
ggml : Fix missing backtrace on Linux (ggml/1228)
|
hai 8 meses |
Nick
|
9c55e5c5c2
fix: check model pointer validity before use (#13631)
|
hai 8 meses |
Chenguang Li
|
33d7aed4a8
CANN: Support MOE Model MUL_MAT_ID (#13042)
|
hai 8 meses |
Isaac McFadyen
|
6a2bc8bfb7
server : added --no-prefill-assistant flag (#13608)
|
hai 8 meses |
Gilad S.
|
e3a7cf6c5b
cmake: use the current build config for vulkan-shaders-gen (#13595)
|
hai 8 meses |
Georgi Gerganov
|
518329b2d4
parallel : add option for non-shared and larger prompts (#13598)
|
hai 8 meses |
Jeff Bolz
|
2f5a4e1e09
vulkan: move common FA code to flash_attn_base.comp (#13556)
|
hai 8 meses |
Jeff Bolz
|
4f41ee11d6
vulkan: use scalar FA rather than coopmat2 when N==1 (#13554)
|
hai 8 meses |