cturan/llama.cpp

Author	SHA1 Message	Date
Johannes Gäßler	8f5afa94c4 CUDA: return -1 for nonexistent compiled arch (#15587)	5 months ago
Georgi Gerganov	b3964c1e89 metal : optimize FA vec for large sequences and BS <= 8 (#15566)	5 months ago
Xuan-Son Nguyen	79a546220c mtmd : support Kimi VL model (#15458)	5 months ago
Georgi Gerganov	85cc1ae998 context : print graph stats for memory-less contexts (#15586)	5 months ago
Georgi Gerganov	1d8d83deaa metal : improve `MUL_MAT_ID` (#15541)	5 months ago
tc-mb	c4e9239064 model : support MiniCPM-V 4.5 (#15575)	5 months ago
Sigbjørn Skjæret	39842a7f73 gguf-py : remove erroneous FFN_GATE entry (#15583)	5 months ago
Sigbjørn Skjæret	0fd90db585 metal : remove contiguous assertion for src0 in IM2COL (#15577)	5 months ago
Yoshi_likes_e4	4c37636b3e Add a warning for special devices (#15563)	5 months ago
Jeff Bolz	34bdbbd7c2 vulkan: Remove splitting for mul_mat_id (#15568)	5 months ago
Qeeweew	74f52f77f2 CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (#15451)	5 months ago
lhez	f7207b0415 opencl: fix support ops condition for `rms_norm` (#15560)	5 months ago
Ruben Ortlam	4d917cd4f6 vulkan: fix min subgroup 16 condition for mmid subgroup optimization (#15565)	5 months ago
Jeff Bolz	886b97a5d6 tests: Generate unique input values for count_equal (#15487)	5 months ago
Ihar Hrachyshka	111f8d06f0 metal: fix regression when no metal devices are present (#15531)	5 months ago
Johannes Gäßler	5eff6ec9b1 CUDA: MoE helper in device code, better tile sizes (#15525)	5 months ago
Daniel Bevenius	dfd9b5f6c7 model-conversion : set pooling type to none in logits.cpp (#15564)	5 months ago
Daniel Bevenius	5a6bc6b1a6 model-conversion : add model card template for embeddings [no ci] (#15557)	5 months ago
Georgi Gerganov	6b64f74b55 batched-bench : fix unified KV cache handling + pp timing (#15562)	5 months ago
Weizhao Ouyang	0d5a470223 convert : update Ernie 4.5 dense architecture name (#15555)	5 months ago
Georgi Gerganov	b0ba31f525 metal : add FA kernels for HS=40 (#15559)	5 months ago
RunningLeon	7da9fed0d6 convert : support interns1-mini (#15412)	5 months ago
Chenguang Li	c247d06f38 CANN: ROPE cache sin/cos repeat (#15501)	5 months ago
Ruben Ortlam	043fb27d38 vulkan: apply MUL_MAT_ID subgroup optimization to non-coopmat devices (#15524)	5 months ago
Georgi Gerganov	b730706a49 kv-cache : support layer reuse (#15504)	5 months ago
Jeff Bolz	c9a24fb932 vulkan: Support FA with any multiple of 8 head sizes (#15537)	5 months ago
Ruben Ortlam	a9c6ffcbfa vulkan: enable Conv2D for Apple after MoltenVK fixed the bug (#15526)	5 months ago
Jeff Bolz	e78cf0d4b1 vulkan: workaround MoltenVK compile failure in multi_add (#15506)	5 months ago
Johannes Gäßler	710dfc465a CUDA: fix half2 -> half conversion for HIP (#15529)	5 months ago
Jeff Bolz	611f419cff vulkan: optimize rms_norm, and allow the work to spread across multiple SMs (#15281)	5 months ago

Newer Older

Commit History Find

Commit History