Reese Levine
|
15bff84bf5
ggml webgpu: initial flashattention implementation (#18610)
|
3 weeks ago |
Jeff Bolz
|
2524c26164
vulkan: fix push constant size for quantize_q8_1 (#18687)
|
3 weeks ago |
Jeff Bolz
|
cb14b06995
vulkan: optimize ssm_scan (#18630)
|
3 weeks ago |
Adrien Gallouët
|
55abc39355
vendor : update cpp-httplib to 0.30.0 (#18660)
|
3 weeks ago |
Georgi Gerganov
|
f2f6c88067
scripts : support chaining commands in pr2wt.sh (#18671)
|
3 weeks ago |
도로로도로또
|
945bf10627
metal : add MoE kernel specialization for ne20=5 (#18667)
|
3 weeks ago |
Johannes Gäßler
|
64848deb18
llama-fit-params: free memory target per device (#18679)
|
3 weeks ago |
Doctor Shotgun
|
9a5724dee2
ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535)
|
3 weeks ago |
Daniel Bevenius
|
9c142e3a2a
model-conversion : add warn about transformers mismatch (#18691)
|
3 weeks ago |
Daniel Bevenius
|
df7fb92170
model-conversion : remove -st targets for converted model (#18689)
|
3 weeks ago |
Julius Tischbein
|
2038101bd9
llama : add `use_direct_io` flag for model loading (#18166)
|
3 weeks ago |
shaofeiqi
|
568371a726
opencl: add FILL op support (#18682)
|
3 weeks ago |
Sigbjørn Skjæret
|
5b8844ae53
scripts : fix repos cloned with .git extension (#18669)
|
3 weeks ago |
Sigbjørn Skjæret
|
7e16fef085
convert : more variants of rope_theta config entries (#18668)
|
3 weeks ago |
Oliver Walsh
|
f5245b5e4e
cuda : fix build on cuda 12.8 (#18672)
|
3 weeks ago |
R
|
ae9f8df778
fix(docker): add missing libglvnd libraries to Vulkan image (#18664)
|
3 weeks ago |
Adrien Gallouët
|
56d2fed2b3
tools : remove llama-run (#18661)
|
3 weeks ago |
Georgi Gerganov
|
56426673cb
scripts : add pr2wt.sh (#18644)
|
3 weeks ago |
Daniel Bevenius
|
bb77764c2d
convert : clarify sentence-transformers-dense-modules help [no ci] (#18662)
|
3 weeks ago |
Sigbjørn Skjæret
|
9dfa8ee950
ci : run cann build unconditionally [no ci] (#18659)
|
3 weeks ago |
Jeff Bolz
|
ca4a8370bc
vulkan: reject ops when a tensor is too large to allocate (#18646)
|
3 weeks ago |
virajwad
|
03023296cf
vulkan: Warptile tuning for Intel Xe2/Xe3 (#18178)
|
3 weeks ago |
Eve
|
8c77a04cc7
vulkan: more mul mat optimizations (#18533)
|
3 weeks ago |
Daniel Bevenius
|
ffba4f29e6
examples : add debug utility/example (#18464)
|
3 weeks ago |
hipudding
|
3333951d86
CANN: Fix rename for get_env (#18652)
|
3 weeks ago |
Raul Torres
|
193ee38a1b
CANN: Rename `get_env` to `get_env_as_lowercase` (#18624)
|
3 weeks ago |
Max Krasnyansky
|
95ea9e0861
Hexagon add support for f16/f32 flash attention, scale, set-rows and improve f16/32 matmul (#18611)
|
3 weeks ago |
Tarek Dakhran
|
ccbc84a537
mtmd: mtmd_audio_streaming_istft (#18645)
|
3 weeks ago |
Johannes Gäßler
|
68b4d516c3
llama-params-fit: fix last devices with low VRAM (#18494)
|
3 weeks ago |
Aadeshveer Singh
|
24af22fc36
ggml : optimize cuda ssm_scan using warp-level reduction (#18505)
|
3 weeks ago |