cturan/llama.cpp

mirror of https://github.com/cturan/llama.cpp

Author	SHA1 Message	Date
Johannes Gäßler	bdcb8f4222 CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860)	1 year ago
Johannes Gäßler	1f0dabda8d CUDA: use tensor cores for MMQ (#7676)	1 year ago
Johannes Gäßler	42b53d192f CUDA: revise q8_1 data layout for mul_mat_q (#7824)	1 year ago
Johannes Gäßler	7d1a378b8f CUDA: refactor mmq, dmmv, mmvq (#7716)	1 year ago
slaren	ae1f211ce2 cuda : refactor into multiple files (#6269)	1 year ago