Commit History

Author SHA1 Message Date
  Marcus Dunn 5be6c803fa llama : remove token functions with `context` args in favor of `model` (#3720) 2 years ago
  goerch 9e70cc0322 Add test for MPT tokenization (#3728) 2 years ago
  Kerfuffle a5e7dbd614 llama : validate special token ids are in range when loading GGUF model (#3635) 2 years ago
  Georgi Gerganov d1031cf49c sampling : refactor init to use llama_sampling_params (#3696) 2 years ago
  Herman Semenov f439e506e8 ggml : fix rope + llama minor optimizations (#3560) 2 years ago
  Georgi Gerganov 0e89203b51 speculative : add tree-based sampling example (#3624) 2 years ago
  slaren cb33f43a2a fix embeddings when using CUDA (#3657) 2 years ago
  Georgi Gerganov e1675d133c llama : avoid fprintf in favor of LLAMA_LOG (#3538) 2 years ago
  staviq 1a159553f9 tokenizer : special token handling (#3538) 2 years ago
  cebtenzzre 11bff29045 MPT : support GQA for replit-code-v1.5 (#3627) 2 years ago
  Daniel Bevenius 2a4bcbacea llama : remove n_threads from llama_decode_internal (#3614) 2 years ago
  goerch 233fc1c69f Minor improvements in GPT2 tokenizer (#3567) 2 years ago
  Xingchen Song(宋星辰) 02d2875def llm : add bloom models (#3553) 2 years ago
  Jan Ploski f5f9121de1 llm : add MPT support (#3417) 2 years ago
  Georgi Gerganov fcca0a7004 refact : fix convert script + zero out KV cache to avoid nans (#3523) 2 years ago
  Georgi Gerganov db3abcc114 sync : ggml (ggml-backend) (#3548) 2 years ago
  Kerfuffle 63d3b06a43 llama : fix missing break in Persimmon arch case statements (#3535) 2 years ago
  cebtenzzre f1782c68de quantize : fail fast on write errors (#3521) 2 years ago
  Phillip Kravtsov 0e797c2fc5 llm : support Adept Persimmon 8B (#3410) 2 years ago
  goerch 3a716b4dae Fix for #3454 (#3455) 2 years ago
  Kerfuffle 9ca79d5cbb kv cache slot search improvements (#3493) 2 years ago
  pudepiedj a8777ad84e parallel : add option to load external prompt file (#3416) 2 years ago
  l3utterfly 16820a5a0d llama : correct hparams comparison (#3446) 2 years ago
  ds5t5 f8c90cdbaa llm : add Refact model (#3329) 2 years ago
  Georgi Gerganov ac2219fef3 llama : fix session saving/loading (#3400) 2 years ago
  Alex Klinkhamer 48be797ffb llama : expose model's rope_freq_scale in the API (#3418) 2 years ago
  goerch ff5a3f0c09 Work on the BPE tokenizer (#3252) 2 years ago
  Adrian a847676984 metal : set log callback before initializing (#3427) 2 years ago
  vvhg1 c97f01c362 infill : add new example + extend server API (#3296) 2 years ago
  Cebtenzzre 2777a84be4 llama : quantize up to 31% faster on Linux and Windows with mmap (#3206) 2 years ago