Commit History

Author SHA1 Message Date
  Georgi Gerganov e84b71c2c6 ggml : drop support for QK_K=64 (#7473) 1 year ago
  agray3 bc4bba364f Introduction of CUDA Graphs to LLama.cpp (#6766) 1 year ago
  DAN™ e00b4a8f81 Fix more int overflow during quant (PPL/CUDA). (#6563) 1 year ago
  slaren 0d56246f4b ggml : group all experts in a single ggml_mul_mat_id (#6505) 1 year ago
  Carolinabanana 5dc9dd7152 llama : add Command R Plus support (#6491) 1 year ago
  Kawrakow 55c1b2a3bb IQ1_M: 1.75 bpw quantization (#6302) 1 year ago
  slaren ae1f211ce2 cuda : refactor into multiple files (#6269) 1 year ago