Johannes Gäßler
|
fbef0fad7a
server: higher timeout for tests (#15621)
|
4 months ago |
Georgi Gerganov
|
da54f9f1a2
presets : add qwen3-30B-a3b FIM (#15616)
|
4 months ago |
uvos
|
47373271f9
HIP: Enable support for ggml_backend_cuda_register_host_buffer (#15615)
|
4 months ago |
Georgi Gerganov
|
1bded5a3b3
kv-cache : better estimate of n_kv for multi-sequence batches (#15610)
|
4 months ago |
Chenguang Li
|
1e7489745a
CANN: refactor mask handling and improve performance in FA (#15561)
|
4 months ago |
xctan
|
1cf123a343
ggml-cpu : add basic RVV support for vector f32 ops (#15057)
|
4 months ago |
Daniel Bevenius
|
fcca2182a1
common : add -m to bash completion for --model [no ci] (#15591)
|
4 months ago |
rmatif
|
86076f92de
OpenCL: add fused group_norm/norm, mul, add (#15314)
|
4 months ago |
Diego Devesa
|
bcbddcd54f
tests : fix test-opt with GGML_BACKEND_DL (#15599)
|
4 months ago |
Akarshan Biswas
|
8b69686136
SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (#15592)
|
4 months ago |
fidoriel
|
8ce3ff1d91
mtmd : fix mtmd ios build (#15579)
|
4 months ago |
Eve
|
44b1efa41a
tests: add performance test for mul mat id (#15543)
|
4 months ago |
shalinib-ibm
|
a6a58d6478
llamafile: PowerPC Sgemm Optimization (#15558)
|
4 months ago |
Georgi Gerganov
|
0373486dbc
graph : fix assert in memory-less build_attn (#15590)
|
4 months ago |
Daniel Bevenius
|
62cef26ac5
model-conversion : add qat-q4 quantization targets (#15588)
|
4 months ago |
Johannes Gäßler
|
8f5afa94c4
CUDA: return -1 for nonexistent compiled arch (#15587)
|
4 months ago |
Georgi Gerganov
|
b3964c1e89
metal : optimize FA vec for large sequences and BS <= 8 (#15566)
|
4 months ago |
Xuan-Son Nguyen
|
79a546220c
mtmd : support Kimi VL model (#15458)
|
4 months ago |
Georgi Gerganov
|
85cc1ae998
context : print graph stats for memory-less contexts (#15586)
|
4 months ago |
Georgi Gerganov
|
1d8d83deaa
metal : improve `MUL_MAT_ID` (#15541)
|
4 months ago |
tc-mb
|
c4e9239064
model : support MiniCPM-V 4.5 (#15575)
|
4 months ago |
Sigbjørn Skjæret
|
39842a7f73
gguf-py : remove erroneous FFN_GATE entry (#15583)
|
4 months ago |
Sigbjørn Skjæret
|
0fd90db585
metal : remove contiguous assertion for src0 in IM2COL (#15577)
|
4 months ago |
Yoshi_likes_e4
|
4c37636b3e
Add a warning for special devices (#15563)
|
4 months ago |
Jeff Bolz
|
34bdbbd7c2
vulkan: Remove splitting for mul_mat_id (#15568)
|
4 months ago |
Qeeweew
|
74f52f77f2
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (#15451)
|
4 months ago |
lhez
|
f7207b0415
opencl: fix support ops condition for `rms_norm` (#15560)
|
4 months ago |
Ruben Ortlam
|
4d917cd4f6
vulkan: fix min subgroup 16 condition for mmid subgroup optimization (#15565)
|
5 months ago |
Jeff Bolz
|
886b97a5d6
tests: Generate unique input values for count_equal (#15487)
|
5 months ago |
Ihar Hrachyshka
|
111f8d06f0
metal: fix regression when no metal devices are present (#15531)
|
5 months ago |