Ruben Ortlam
|
043fb27d38
vulkan: apply MUL_MAT_ID subgroup optimization to non-coopmat devices (#15524)
|
4 ヶ月 前 |
Georgi Gerganov
|
b730706a49
kv-cache : support layer reuse (#15504)
|
4 ヶ月 前 |
Jeff Bolz
|
c9a24fb932
vulkan: Support FA with any multiple of 8 head sizes (#15537)
|
4 ヶ月 前 |
Ruben Ortlam
|
a9c6ffcbfa
vulkan: enable Conv2D for Apple after MoltenVK fixed the bug (#15526)
|
4 ヶ月 前 |
Jeff Bolz
|
e78cf0d4b1
vulkan: workaround MoltenVK compile failure in multi_add (#15506)
|
4 ヶ月 前 |
Johannes Gäßler
|
710dfc465a
CUDA: fix half2 -> half conversion for HIP (#15529)
|
4 ヶ月 前 |
Jeff Bolz
|
611f419cff
vulkan: optimize rms_norm, and allow the work to spread across multiple SMs (#15281)
|
4 ヶ月 前 |
Piotr Wilkin (ilintar)
|
b1afcab804
model : add support for Seed-OSS (#15490)
|
4 ヶ月 前 |
Johannes Gäßler
|
9ef536907d
scripts: fix compare-llama-bench.py (#15521)
|
4 ヶ月 前 |
LaffeyNyaa
|
21dc4ddaf2
chat : fix debug build assertion in trim function (#15520)
|
4 ヶ月 前 |
Jeff Bolz
|
289bf4113e
vulkan: Rewrite synchronization to allow some overlap between nodes (#15489)
|
4 ヶ月 前 |
R0CKSTAR
|
b55f06e1aa
vulkan.Dockerfile: install vulkan SDK using tarball (#15282)
|
4 ヶ月 前 |
Acly
|
0a9b43e507
vulkan : support ggml_mean (#15393)
|
4 ヶ月 前 |
Jeff Bolz
|
330c3d2d21
vulkan: optimize mul_mat_id loading row ids into shared memory (#15427)
|
4 ヶ月 前 |
Johannes Gäßler
|
e92734d51b
test-opt: allow slight inprecision (#15503)
|
4 ヶ月 前 |
Reese Levine
|
45363632cb
ggml WebGPU: add support for quantization types (#15440)
|
4 ヶ月 前 |
Aldehir Rojas
|
32732f2459
model : gpt-oss add response_format support (#15494)
|
4 ヶ月 前 |
rmatif
|
92f7f0a53c
ggml: add `conv3d` op (#15182)
|
4 ヶ月 前 |
Yavor Ivanov
|
b1ab91821f
cuda : add Pad Reflect 1D support (#14659)
|
4 ヶ月 前 |
Georgi Gerganov
|
9ebebef62f
llama : remove KV cache defragmentation logic (#15473)
|
4 ヶ月 前 |
Aaron Teo
|
ad5c975c2d
ggml-cpu: Support Q5_0 and Q5_1 on s390x (#15486)
|
4 ヶ月 前 |
65a
|
4afb0a746f
server : Support multimodal completion and embeddings prompts in JSON format (#15108)
|
4 ヶ月 前 |
Tarek Dakhran
|
e288693669
readme : model : mtdm : lfm2 improvements (#15476)
|
4 ヶ月 前 |
Chenguang Li
|
a0f98dd604
CANN: Optimize RMS_NORM using cache (#15419)
|
4 ヶ月 前 |
Diego Devesa
|
54a241f505
sched : fix possible use of wrong ids tensor when offloading moe prompt processing (#15488)
|
4 ヶ月 前 |
Georgi Gerganov
|
cd36b5e5c7
llama : remove deprecated llama_kv_self API (#15472)
|
5 ヶ月 前 |
Georgi Gerganov
|
3f196be84b
graph : remove build_attn_with_sinks overload (#15469)
|
5 ヶ月 前 |
Acly
|
97ae5961a4
vulkan : support conv_2d_dw with f16 weights (#15392)
|
5 ヶ月 前 |
Dong Won Kim
|
20c2dac8c6
vulkan: add exp operation (#15456)
|
5 ヶ月 前 |
Jeff Bolz
|
96452a3fa4
vulkan: Reuse conversion results in prealloc_y (#15410)
|
5 ヶ月 前 |