fidoriel
|
8ce3ff1d91
mtmd : fix mtmd ios build (#15579)
|
4 bulan lalu |
Eve
|
44b1efa41a
tests: add performance test for mul mat id (#15543)
|
4 bulan lalu |
shalinib-ibm
|
a6a58d6478
llamafile: PowerPC Sgemm Optimization (#15558)
|
4 bulan lalu |
Georgi Gerganov
|
0373486dbc
graph : fix assert in memory-less build_attn (#15590)
|
4 bulan lalu |
Daniel Bevenius
|
62cef26ac5
model-conversion : add qat-q4 quantization targets (#15588)
|
4 bulan lalu |
Johannes Gäßler
|
8f5afa94c4
CUDA: return -1 for nonexistent compiled arch (#15587)
|
4 bulan lalu |
Georgi Gerganov
|
b3964c1e89
metal : optimize FA vec for large sequences and BS <= 8 (#15566)
|
4 bulan lalu |
Xuan-Son Nguyen
|
79a546220c
mtmd : support Kimi VL model (#15458)
|
4 bulan lalu |
Georgi Gerganov
|
85cc1ae998
context : print graph stats for memory-less contexts (#15586)
|
4 bulan lalu |
Georgi Gerganov
|
1d8d83deaa
metal : improve `MUL_MAT_ID` (#15541)
|
4 bulan lalu |
tc-mb
|
c4e9239064
model : support MiniCPM-V 4.5 (#15575)
|
4 bulan lalu |
Sigbjørn Skjæret
|
39842a7f73
gguf-py : remove erroneous FFN_GATE entry (#15583)
|
4 bulan lalu |
Sigbjørn Skjæret
|
0fd90db585
metal : remove contiguous assertion for src0 in IM2COL (#15577)
|
4 bulan lalu |
Yoshi_likes_e4
|
4c37636b3e
Add a warning for special devices (#15563)
|
4 bulan lalu |
Jeff Bolz
|
34bdbbd7c2
vulkan: Remove splitting for mul_mat_id (#15568)
|
4 bulan lalu |
Qeeweew
|
74f52f77f2
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (#15451)
|
4 bulan lalu |
lhez
|
f7207b0415
opencl: fix support ops condition for `rms_norm` (#15560)
|
4 bulan lalu |
Ruben Ortlam
|
4d917cd4f6
vulkan: fix min subgroup 16 condition for mmid subgroup optimization (#15565)
|
4 bulan lalu |
Jeff Bolz
|
886b97a5d6
tests: Generate unique input values for count_equal (#15487)
|
4 bulan lalu |
Ihar Hrachyshka
|
111f8d06f0
metal: fix regression when no metal devices are present (#15531)
|
4 bulan lalu |
Johannes Gäßler
|
5eff6ec9b1
CUDA: MoE helper in device code, better tile sizes (#15525)
|
4 bulan lalu |
Daniel Bevenius
|
dfd9b5f6c7
model-conversion : set pooling type to none in logits.cpp (#15564)
|
4 bulan lalu |
Daniel Bevenius
|
5a6bc6b1a6
model-conversion : add model card template for embeddings [no ci] (#15557)
|
4 bulan lalu |
Georgi Gerganov
|
6b64f74b55
batched-bench : fix unified KV cache handling + pp timing (#15562)
|
4 bulan lalu |
Weizhao Ouyang
|
0d5a470223
convert : update Ernie 4.5 dense architecture name (#15555)
|
4 bulan lalu |
Georgi Gerganov
|
b0ba31f525
metal : add FA kernels for HS=40 (#15559)
|
4 bulan lalu |
RunningLeon
|
7da9fed0d6
convert : support interns1-mini (#15412)
|
4 bulan lalu |
Chenguang Li
|
c247d06f38
CANN: ROPE cache sin/cos repeat (#15501)
|
4 bulan lalu |
Ruben Ortlam
|
043fb27d38
vulkan: apply MUL_MAT_ID subgroup optimization to non-coopmat devices (#15524)
|
5 bulan lalu |
Georgi Gerganov
|
b730706a49
kv-cache : support layer reuse (#15504)
|
5 bulan lalu |