Commit History

Author SHA1 Message Date
  SeungWon Jeong fb215c3832 server : normalize embeddings (#5956) 1 year ago
  Minsoo Cheong 6d341ab6c5 speculative : implement stochastic speculative sampling (#5625) 1 year ago
  DAN™ 82f3e668ad common : use LLAMA_DEFAULT_SEED (#5855) 1 year ago
  Douglas Hanley 475df1d6cf llama : allow for user specified embedding pooling type (#5849) 1 year ago
  Pierrick Hymbert 3ab8b3a92e llama : cleanup unused mmq flags (#5772) 1 year ago
  Georgi Gerganov 9d533a77d0 llama : fix defrag bugs + add parameter (#5735) 1 year ago
  Georgi Gerganov ab336a9d5e code : normalize enum names (#5697) 1 year ago
  Alexey Parfenov 6dcc02d244 server : add "samplers" param to control the samplers order (#5494) 1 year ago
  bmwl f486f6e1e5 ggml : add numa options (#5377) 1 year ago
  Alexey Parfenov a803333a4e common : use enums for sampler types (#5418) 1 year ago
  Jared Van Bortel 1ec3332ade YaRN : store rope scaling type as int32_t in memory (#5285) 1 year ago
  Georgi Gerganov 5cb04dbc16 llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240) 2 years ago
  Kawrakow 6f9939d119 KL-divergence (#5076) 2 years ago
  Kawrakow 7dcbe39d36 Add ability to evauate multiple choice tasks (#5047) 2 years ago
  Kawrakow 682986a08e Add Winogrande evaluation (#5015) 2 years ago
  stduhpf e0324285a5 speculative : threading options (#4959) 2 years ago
  Yann Follet 722d33f34e main : add parameter --no-display-prompt (#4541) 2 years ago
  slaren e7e4df031b llama : ggml-backend integration (#4766) 2 years ago
  Georgi Gerganov 7edefbd79c main : better name for variable n_print (#4874) 2 years ago
  Georgi Gerganov 3ca63b4538 main : disable token count by default (#4874) 2 years ago
  pudepiedj 43f76bf1c3 main : print total token count and tokens consumed so far (#4874) 2 years ago
  Georgi Gerganov 52531fdff8 main : add self-extend support (#4815) 2 years ago
  LeonEricsson 7082d24cec lookup : add prompt lookup decoding example (#4484) 2 years ago
  Georgi Gerganov bcc0eb4591 llama : per-layer KV cache + quantum K cache (#4309) 2 years ago
  Kerfuffle 5aa365d88f llama : allow overriding GGUF metadata when loading model (#4092) 2 years ago
  MaggotHATE 52c8bc3cf3 sampling : custom samplers order (#4285) 2 years ago
  Georgi Gerganov 6b0a7420d0 llama : KV cache view API + better KV cache management (#4170) 2 years ago
  Seb C 881800d1f0 main : Add ChatML functionality to main example (#4046) 2 years ago
  Kerfuffle 91f6499393 Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040) 2 years ago
  Georgi Gerganov 8f961abdc4 speculative : change default p_accept to 0.5 + CLI args (#3919) 2 years ago