Commit History

Author SHA1 Message Date
  David Friehs df845cc982 llama : minimize size used for state save/load (#4820) 2 years ago
  Georgi Gerganov 15ebe59210 convert : update phi-2 to latest HF repo (#4903) 2 years ago
  slaren e7e4df031b llama : ggml-backend integration (#4766) 2 years ago
  Georgi Gerganov 584d674be6 llama : remove redundant assert for StableLM (#4901) 2 years ago
  Georgi Gerganov 3cabe80630 llama : fix typo "imp_embd" -> "inp_embd" 2 years ago
  Georgi Gerganov f445c0e68c llama : fix llm_build_k_shift to use correct n_rot (#4889) 2 years ago
  Kawrakow 469e75d0a3 llama : restore intended k-quants mixes for MoE models (#4872) 2 years ago
  Kawrakow 49662cbed3 ggml : SOTA 2-bit quants (add IQ2_XS) (#4856) 2 years ago
  pudepiedj 43f76bf1c3 main : print total token count and tokens consumed so far (#4874) 2 years ago
  Brian 57d016ba2d llama : add additional suffixes for model params (#4834) 2 years ago
  Austin 329ff61569 llama : recognize 1B phi models (#4847) 2 years ago
  Kawrakow dd5ae06405 SOTA 2-bit quants (#4773) 2 years ago
  Georgi Gerganov b0034d93ce examples : add passkey test (#3856) 2 years ago
  Georgi Gerganov 9dede37d81 llama : remove unused vars (#4796) 2 years ago
  Georgi Gerganov 3c36213df8 llama : remove redundant GQA check (#4796) 2 years ago
  Georgi Gerganov d117d4dc5d llama : print tensor meta for debugging 2 years ago
  Georgi Gerganov 540938f890 llama : llama_model_desc print number of experts 2 years ago
  Marcus Dunn 0040d42eeb llama : replace all API facing `int`'s with `int32_t` (#4577) 2 years ago
  postmasters 83e633c27e llama : differentiate the KV dims in the attention (#4657) 2 years ago
  automaticcat 24a447e20a ggml : add ggml_cpu_has_avx_vnni() (#4589) 2 years ago
  manikbhandari ea5497df5d gpt2 : Add gpt2 architecture integration (#4555) 2 years ago
  Nam D. Tran f6793491b5 llama : add AWQ for llama, llama2, mpt, and mistral models (#4593) 2 years ago
  slaren dc68f0054c cuda : fix vmm pool with multi GPU (#4620) 2 years ago
  Shintarou Okada 753be377b6 llama : add PLaMo model (#3557) 2 years ago
  slaren 5bf3953d7e cuda : improve cuda pool efficiency using virtual memory (#4606) 2 years ago
  slaren 708e179e85 fallback to CPU buffer if host buffer alloc fails (#4610) 2 years ago
  slaren 48b7ff193e llama : fix platforms without mmap (#4578) 2 years ago
  crasm c7e9701f86 llama : add ability to cancel model loading (#4462) 2 years ago
  Georgi Gerganov afefa319f1 ggml : change ggml_scale to take a float instead of tensor (#4573) 2 years ago
  slaren d232aca5a7 llama : initial ggml-backend integration (#4520) 2 years ago