cturan/llama.cpp

作者	SHA1 メッセージ	日付
Oliver Simons	00681dfc16 CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#15872)	4 ヶ月前
Daniel Bevenius	e7b6d83b52 tests : filter out no-ops from coverage report (#15900)	4 ヶ月前
Jeff Bolz	4f63cd705c vulkan: Fix OOB accesses in soft_max_back (#15861)	4 ヶ月前
Aman Gupta	a972faebed CUDA: Add mul_mat_id support for the mmf kernel (#15767)	4 ヶ月前
Georgi Gerganov	f28d4f4ac9 metal : refactor + optimize (#15857)	4 ヶ月前
Xuan-Son Nguyen	9fcb29f22f ggml: allow casting between f32 and i32 (#15783)	4 ヶ月前
Jeff Bolz	d413dca003 tests: large sizes for get_rows (#15687)	4 ヶ月前
Jeff Bolz	3976dfbe00 vulkan: support im2col_3d (#15795)	4 ヶ月前
Jeff Bolz	c97b5e5854 vulkan: Support pad_ext (#15794)	4 ヶ月前
Daniel Bevenius	3a550b5ca4 tests : add --list-ops and --show-coverage options (#15745)	4 ヶ月前
leejet	0a1b3982cd ggml: add ops for WAN video model (cuda && cpu) (#15669)	4 ヶ月前
rmatif	86076f92de OpenCL: add fused group_norm/norm, mul, add (#15314)	4 ヶ月前
Eve	44b1efa41a tests: add performance test for mul mat id (#15543)	4 ヶ月前
Georgi Gerganov	1d8d83deaa metal : improve `MUL_MAT_ID` (#15541)	4 ヶ月前
Jeff Bolz	34bdbbd7c2 vulkan: Remove splitting for mul_mat_id (#15568)	4 ヶ月前
Jeff Bolz	886b97a5d6 tests: Generate unique input values for count_equal (#15487)	4 ヶ月前
Jeff Bolz	c9a24fb932 vulkan: Support FA with any multiple of 8 head sizes (#15537)	4 ヶ月前
Jeff Bolz	611f419cff vulkan: optimize rms_norm, and allow the work to spread across multiple SMs (#15281)	4 ヶ月前
Acly	0a9b43e507 vulkan : support ggml_mean (#15393)	4 ヶ月前
rmatif	92f7f0a53c ggml: add `conv3d` op (#15182)	4 ヶ月前
Jeff Bolz	96452a3fa4 vulkan: Reuse conversion results in prealloc_y (#15410)	4 ヶ月前
Jeff Bolz	de5627910d vulkan: Optimize argsort (#15354)	5 ヶ月前
Jeff Bolz	1fe00296f5 vulkan: fuse adds (#15252)	5 ヶ月前
Jeff Bolz	2e2b22ba66 vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (#15334)	5 ヶ月前
Georgi Gerganov	5edf1592fd vulkan : fix out-of-bounds access in argmax kernel (#15342)	5 ヶ月前
Jonathan Graehl	5cdb27e091 finetune: SGD optimizer, more CLI args (#13873)	5 ヶ月前
Oliver Simons	6028bf7435 CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (#15132)	5 ヶ月前
Georgi Gerganov	fd1234cb46 llama : add gpt-oss (#15091)	5 ヶ月前
Jeff Bolz	ec0b18802c vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (#15015)	5 ヶ月前
Sigbjørn Skjæret	138b288b59 cuda : add softcap fusion (#14907)	5 ヶ月前

新しい古い

コミット履歴 検索

コミット履歴