Commit History

Author SHA1 Message Date
  Johannes Gäßler 42b53d192f CUDA: revise q8_1 data layout for mul_mat_q (#7824) 1 year ago
  Johannes Gäßler 7d1a378b8f CUDA: refactor mmq, dmmv, mmvq (#7716) 1 year ago
  Johannes Gäßler 9b596417af CUDA: quantized KV support for FA vec (#7527) 1 year ago
  Georgi Gerganov e84b71c2c6 ggml : drop support for QK_K=64 (#7473) 1 year ago
  Johannes Gäßler fcf6538ba6 CUDA: fix unused warning in mmq.cu (#7442) 1 year ago
  Johannes Gäßler d8ee902227 CUDA: deduplicate mmq code (#7397) 1 year ago
  agray3 bc4bba364f Introduction of CUDA Graphs to LLama.cpp (#6766) 1 year ago
  slaren ae1f211ce2 cuda : refactor into multiple files (#6269) 1 year ago