Commit History

Author SHA1 Message Date
  slaren e85bb1a8e7 llama : add functions to get the model's metadata (#4013) 2 years ago
  Georgi Gerganov 4f447a4833 llama : fix data units (#4101) 2 years ago
  Kerfuffle 91f6499393 Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040) 2 years ago
  Jared Van Bortel a6fc554e26 llama : restore prefix space in llama tokenizer (#4081) 2 years ago
  Galunid 36eed0c42c stablelm : StableLM support (#3586) 2 years ago
  Georgi Gerganov 4760e7cc0b sync : ggml (backend v2) (#3912) 2 years ago
  Kerfuffle bb50a792ec Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041) 2 years ago
  Galunid df9d1293de Unbreak persimmon after #3837 (#4010) 2 years ago
  Meng Zhang 46876d2a2c cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946) 2 years ago
  Meng Zhang 3d48f42efc llama : mark LLM_ARCH_STARCODER as full offload supported (#3945) 2 years ago
  cebtenzzre 3fdbe6b66b llama : change yarn_ext_factor placeholder to -1 (#3922) 2 years ago
  Georgi Gerganov 1efae9b7dc llm : prevent from 1-D tensors being GPU split (#3697) 2 years ago
  cebtenzzre 0eb332a10f llama : fix llama_context_default_params after #2268 (#3893) 2 years ago
  cebtenzzre 898aeca90a llama : implement YaRN RoPE scaling (#2268) 2 years ago
  Georgi Gerganov c43c2da8af llm : fix llm_build_kqv taking unused tensor (benign, #3837) 2 years ago
  Georgi Gerganov 523e49b111 llm : fix falcon norm after refactoring (#3837) 2 years ago
  Georgi Gerganov 50337961a6 llm : add llm_build_context (#3881) 2 years ago
  Andrew Godfrey 73bdcb395e finetune : add -ngl parameter (#3762) 2 years ago
  Georgi Gerganov 71e3718abd llama : refactor graph build code (#3837) 2 years ago
  kalomaze 238657db23 samplers : Min-P sampler implementation [alternative to Top P/Top K] (#3841) 2 years ago
  Georgi Gerganov 207b51900e ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861) 2 years ago
  Kerfuffle 6e08281e58 Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843) 2 years ago
  Georgi Gerganov 71a09da301 llama : fix kv shift bug (#3835) 2 years ago
  Georgi Gerganov d69d777c02 ggml : quantization refactoring (#3833) 2 years ago
  Kerfuffle bd6d9e2059 llama : allow quantizing k-quants to fall back when tensor size incompatible (#3747) 2 years ago
  Georgi Gerganov fdee152e4e starcoder : add GPU offloading (#3827) 2 years ago
  cebtenzzre 6d459cbfbe llama : correctly report GGUFv3 format (#3818) 2 years ago
  Georgi Gerganov 2f9ec7e271 cuda : improve text-generation and batched decoding performance (#3776) 2 years ago
  Marcus Dunn 5be6c803fa llama : remove token functions with `context` args in favor of `model` (#3720) 2 years ago
  goerch 9e70cc0322 Add test for MPT tokenization (#3728) 2 years ago