Commit History

Author SHA1 Message Date
  Jared Van Bortel 32c8486e1f wpm : portable unicode tolower (#6305) 1 year ago
  compilade 557410b8f0 llama : greatly reduce output buffer memory usage (#6122) 1 year ago
  Kawrakow 55c1b2a3bb IQ1_M: 1.75 bpw quantization (#6302) 1 year ago
  Kawrakow d25b1c31b0 quantize : be able to override metadata by key (#6321) 1 year ago
  slaren 280345968d cuda : rename build flag to LLAMA_CUDA (#6299) 1 year ago
  Meng, Hengyu ddf6568510 [SYCL] offload op (#6217) 1 year ago
  Jared Van Bortel 94d1b3b411 use _wfopen instead of fopen on Windows (#6248) 1 year ago
  Pierrick Hymbert f482bb2e49 common: llama_load_model_from_url split support (#6192) 1 year ago
  Julius Arkenberg 476b0251b2 llama : add grok-1 support (#6204) 1 year ago
  Kawrakow 1d0331c12a quantize: options for output and token embedding tensors qtype (#6239) 1 year ago
  Pierrick Hymbert dba1af6129 llama_model_loader: support multiple split/shard GGUFs (#6187) 1 year ago
  Nexesenex e80f06d2a1 llama : correction of the attn.v.weight quantization for IQ3_XS (#6209) 1 year ago
  Georgi Gerganov 95d576b48e metal : pad n_ctx by 32 (#6177) 1 year ago
  Jared Van Bortel d199ca79f2 mpt : implement backwards compatiblity with duped output tensor (#6139) 1 year ago
  slaren 2bf8d0f7c4 backend : offload large batches to GPU (#6083) 1 year ago
  slaren d84c48505f llama : fix Baichuan2 13B (#6092) 1 year ago
  Theia Vogel 877b4d0c62 llama : add support for control vectors (#5970) 1 year ago
  Andrew Canis 12247f4c69 llama : add Command-R support (#6033) 1 year ago
  Neo Zhang Jianyu 46acb36767 fix set main gpu error (#6073) 1 year ago
  Xuan Son Nguyen aab606a11f llama : add Orion chat template (#6066) 1 year ago
  Georgi Gerganov 4755afd1cb llama : fix integer overflow during quantization (#6063) 1 year ago
  Michael Podvitskiy 69ff61397d llama : support models without vocabulary (#5798) 1 year ago
  Georgi Gerganov a44bc969e4 llama : fix typo 1 year ago
  Michael Podvitskiy 2c4fb69246 llama : optimize defrag moves + fix fragmentation calculation (#6037) 1 year ago
  slaren f30ea47a87 llama : add pipeline parallelism support (#6017) 1 year ago
  gliptic 5cdb371731 grammar : fix unnecessarily retained pointer to rules (#6003) 1 year ago
  Georgi Gerganov 05b06210c9 llama : more consistent names of count variables (#5994) 1 year ago
  Georgi Gerganov 83796e62bc llama : refactor unicode stuff (#5992) 1 year ago
  Michael Podvitskiy 3202361c5b ggml, ci : Windows ARM runner and build fixes (#5979) 1 year ago
  Georgi Gerganov ee35600b90 llama : fix F16/F32 downcast + improve names (#5980) 1 year ago