Commit History

Autor SHA1 Mensaxe Data
  SeungWon Jeong fb215c3832 server : normalize embeddings (#5956) hai 1 ano
  Minsoo Cheong 6d341ab6c5 speculative : implement stochastic speculative sampling (#5625) hai 1 ano
  DAN™ 82f3e668ad common : use LLAMA_DEFAULT_SEED (#5855) hai 1 ano
  Douglas Hanley 475df1d6cf llama : allow for user specified embedding pooling type (#5849) hai 1 ano
  Pierrick Hymbert 3ab8b3a92e llama : cleanup unused mmq flags (#5772) hai 1 ano
  Georgi Gerganov 9d533a77d0 llama : fix defrag bugs + add parameter (#5735) hai 1 ano
  Georgi Gerganov ab336a9d5e code : normalize enum names (#5697) hai 1 ano
  Alexey Parfenov 6dcc02d244 server : add "samplers" param to control the samplers order (#5494) hai 1 ano
  bmwl f486f6e1e5 ggml : add numa options (#5377) hai 1 ano
  Alexey Parfenov a803333a4e common : use enums for sampler types (#5418) hai 1 ano
  Jared Van Bortel 1ec3332ade YaRN : store rope scaling type as int32_t in memory (#5285) hai 1 ano
  Georgi Gerganov 5cb04dbc16 llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240) %!s(int64=2) %!d(string=hai) anos
  Kawrakow 6f9939d119 KL-divergence (#5076) %!s(int64=2) %!d(string=hai) anos
  Kawrakow 7dcbe39d36 Add ability to evauate multiple choice tasks (#5047) %!s(int64=2) %!d(string=hai) anos
  Kawrakow 682986a08e Add Winogrande evaluation (#5015) %!s(int64=2) %!d(string=hai) anos
  stduhpf e0324285a5 speculative : threading options (#4959) %!s(int64=2) %!d(string=hai) anos
  Yann Follet 722d33f34e main : add parameter --no-display-prompt (#4541) %!s(int64=2) %!d(string=hai) anos
  slaren e7e4df031b llama : ggml-backend integration (#4766) %!s(int64=2) %!d(string=hai) anos
  Georgi Gerganov 7edefbd79c main : better name for variable n_print (#4874) %!s(int64=2) %!d(string=hai) anos
  Georgi Gerganov 3ca63b4538 main : disable token count by default (#4874) %!s(int64=2) %!d(string=hai) anos
  pudepiedj 43f76bf1c3 main : print total token count and tokens consumed so far (#4874) %!s(int64=2) %!d(string=hai) anos
  Georgi Gerganov 52531fdff8 main : add self-extend support (#4815) %!s(int64=2) %!d(string=hai) anos
  LeonEricsson 7082d24cec lookup : add prompt lookup decoding example (#4484) %!s(int64=2) %!d(string=hai) anos
  Georgi Gerganov bcc0eb4591 llama : per-layer KV cache + quantum K cache (#4309) %!s(int64=2) %!d(string=hai) anos
  Kerfuffle 5aa365d88f llama : allow overriding GGUF metadata when loading model (#4092) %!s(int64=2) %!d(string=hai) anos
  MaggotHATE 52c8bc3cf3 sampling : custom samplers order (#4285) %!s(int64=2) %!d(string=hai) anos
  Georgi Gerganov 6b0a7420d0 llama : KV cache view API + better KV cache management (#4170) %!s(int64=2) %!d(string=hai) anos
  Seb C 881800d1f0 main : Add ChatML functionality to main example (#4046) %!s(int64=2) %!d(string=hai) anos
  Kerfuffle 91f6499393 Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040) %!s(int64=2) %!d(string=hai) anos
  Georgi Gerganov 8f961abdc4 speculative : change default p_accept to 0.5 + CLI args (#3919) %!s(int64=2) %!d(string=hai) anos