cturan/llama.cpp

Author	SHA1 Message	Date
Ihar Hrachyshka	111f8d06f0 metal: fix regression when no metal devices are present (#15531)	5 months ago
Johannes Gäßler	5eff6ec9b1 CUDA: MoE helper in device code, better tile sizes (#15525)	5 months ago
Daniel Bevenius	dfd9b5f6c7 model-conversion : set pooling type to none in logits.cpp (#15564)	5 months ago
Daniel Bevenius	5a6bc6b1a6 model-conversion : add model card template for embeddings [no ci] (#15557)	5 months ago
Georgi Gerganov	6b64f74b55 batched-bench : fix unified KV cache handling + pp timing (#15562)	5 months ago
Weizhao Ouyang	0d5a470223 convert : update Ernie 4.5 dense architecture name (#15555)	5 months ago
Georgi Gerganov	b0ba31f525 metal : add FA kernels for HS=40 (#15559)	5 months ago
RunningLeon	7da9fed0d6 convert : support interns1-mini (#15412)	5 months ago
Chenguang Li	c247d06f38 CANN: ROPE cache sin/cos repeat (#15501)	5 months ago
Ruben Ortlam	043fb27d38 vulkan: apply MUL_MAT_ID subgroup optimization to non-coopmat devices (#15524)	5 months ago
Georgi Gerganov	b730706a49 kv-cache : support layer reuse (#15504)	5 months ago
Jeff Bolz	c9a24fb932 vulkan: Support FA with any multiple of 8 head sizes (#15537)	5 months ago
Ruben Ortlam	a9c6ffcbfa vulkan: enable Conv2D for Apple after MoltenVK fixed the bug (#15526)	5 months ago
Jeff Bolz	e78cf0d4b1 vulkan: workaround MoltenVK compile failure in multi_add (#15506)	5 months ago
Johannes Gäßler	710dfc465a CUDA: fix half2 -> half conversion for HIP (#15529)	5 months ago
Jeff Bolz	611f419cff vulkan: optimize rms_norm, and allow the work to spread across multiple SMs (#15281)	5 months ago
Piotr Wilkin (ilintar)	b1afcab804 model : add support for Seed-OSS (#15490)	5 months ago
Johannes Gäßler	9ef536907d scripts: fix compare-llama-bench.py (#15521)	5 months ago
LaffeyNyaa	21dc4ddaf2 chat : fix debug build assertion in trim function (#15520)	5 months ago
Jeff Bolz	289bf4113e vulkan: Rewrite synchronization to allow some overlap between nodes (#15489)	5 months ago
R0CKSTAR	b55f06e1aa vulkan.Dockerfile: install vulkan SDK using tarball (#15282)	5 months ago
Acly	0a9b43e507 vulkan : support ggml_mean (#15393)	5 months ago
Jeff Bolz	330c3d2d21 vulkan: optimize mul_mat_id loading row ids into shared memory (#15427)	5 months ago
Johannes Gäßler	e92734d51b test-opt: allow slight inprecision (#15503)	5 months ago
Reese Levine	45363632cb ggml WebGPU: add support for quantization types (#15440)	5 months ago
Aldehir Rojas	32732f2459 model : gpt-oss add response_format support (#15494)	5 months ago
rmatif	92f7f0a53c ggml: add `conv3d` op (#15182)	5 months ago
Yavor Ivanov	b1ab91821f cuda : add Pad Reflect 1D support (#14659)	5 months ago
Georgi Gerganov	9ebebef62f llama : remove KV cache defragmentation logic (#15473)	5 months ago
Aaron Teo	ad5c975c2d ggml-cpu: Support Q5_0 and Q5_1 on s390x (#15486)	5 months ago

Newer Older

Commit History Find

Commit History