Commit History

Author SHA1 Message Date
  slaren 6744dbe924 ggml : use ggml_row_size where possible (#4472) 2 years ago
  Georgi Gerganov 4d98d9a656 sync : ggml (SD ops, tests, kernels) (#4444) 2 years ago
  slaren 799a1cb13b llama : add Mixtral support (#4406) 2 years ago
  Georgi Gerganov fe680e3d10 sync : ggml (new ops, tests, backend, etc.) (#4359) 2 years ago
  Georgi Gerganov bcc0eb4591 llama : per-layer KV cache + quantum K cache (#4309) 2 years ago
  Georgi Gerganov ef47ec18da ggml : add ggml_soft_max_ext (#4256) 2 years ago
  slaren 8a052c131e ggml-cuda : support stablelm rope (#4156) 2 years ago
  Haohui Mai 55978ce09b Fix incorrect format strings and uninitialized variables. (#4133) 2 years ago
  Kerfuffle 2923f17f6f Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124) 2 years ago
  Andrew Godfrey b83e149ec6 cuda : get_row_rounding F32 (#4095) 2 years ago
  Georgi Gerganov 4f447a4833 llama : fix data units (#4101) 2 years ago
  slaren 1cf2850d52 ggml-cuda : increase max graph size (#4084) 2 years ago
  Georgi Gerganov 3d68f364f1 ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060) 2 years ago
  Georgi Gerganov 4760e7cc0b sync : ggml (backend v2) (#3912) 2 years ago
  Kerfuffle bb50a792ec Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041) 2 years ago
  Meng Zhang 46876d2a2c cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946) 2 years ago
  slaren 2833a6f63c ggml-cuda : fix f16 mul mat (#3961) 2 years ago
  Jared Van Bortel 132d25b8a6 cuda : fix disabling device with --tensor-split 1,0 (#3951) 2 years ago
  slaren 48ade94538 cuda : revert CUDA pool stuff (#3944) 2 years ago
  slaren abb77e7319 ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) 2 years ago
  Kerfuffle 629f917cd6 cuda : add ROCM aliases for CUDA pool stuff (#3918) 2 years ago
  Georgi Gerganov c7743fe1c1 cuda : fix const ptrs warning causing ROCm build issues (#3913) 2 years ago
  Oleksii Maryshchenko d6069051de cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903) 2 years ago
  Georgi Gerganov 4d719a6d4e cuda : check if this fixes Pascal card regression (#3882) 2 years ago
  cebtenzzre 2fffa0d61f cuda : fix RoPE after #2268 (#3897) 2 years ago
  slaren d02e98cde0 ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891) 2 years ago
  cebtenzzre 898aeca90a llama : implement YaRN RoPE scaling (#2268) 2 years ago
  Andrew Godfrey 73bdcb395e finetune : add -ngl parameter (#3762) 2 years ago
  Georgi Gerganov 2f9ec7e271 cuda : improve text-generation and batched decoding performance (#3776) 2 years ago
  Georgi Gerganov 6961c4bd0b batched-bench : print params at start 2 years ago