Commit History

作者 SHA1 備註 提交日期
  eric8607242 ee1b497c98 llama : support more diverse tokenizers? (#2420) 2 年之前
  Rand Xie 65cdf34bdc llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433) 2 年之前
  Georgi Gerganov 1a941869cb metal : disable graph concurrency optimization due to bug (#2413) 2 年之前
  slaren 5488fb789e ggml : allocate graphs in a context (#2392) 2 年之前
  Kawrakow eb542d3932 Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384) 2 年之前
  slaren da1889834a ggml : improve graph build time via hash table lookup (#2329) 2 年之前
  Shouzheng Liu 1aa18ef994 metal : concurrently dispatch commands (#2358) 2 年之前
  slaren 41c674161f make rms_norm_eps a parameter (#2374) 2 年之前
  Evan Jones 84e09a7d8b llama : add grammar-based sampling (#1773) 2 年之前
  Georgi Gerganov e76d630df1 llama : grouped-query attention + LLaMAv2 70B support (#2276) 2 年之前
  Christian Demsar a940458e48 llama : print max tensor size to stderr (#2336) 2 年之前
  Georgi Gerganov b47b8a9cfe llama : optimize memory buffers (#2325) 2 年之前
  Georgi Gerganov 513f861953 ggml : fix rope args order + assert (#2054) 2 年之前
  Guillaume "Vermeille" Sanchez ab0e26bdfb llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280) 2 年之前
  Georgi Gerganov ae178ab46b llama : make tensor_split ptr instead of array (#2272) 2 年之前
  Georgi Gerganov fff0e0eafe llama : fix regression from #2000 - could not load no-mmap models 2 年之前
  Rinne 294f424554 llama : extend API to get max devices at runtime (#2253) 2 年之前
  Georgi Gerganov d01bccde9f ci : integrate with ggml-org/ci (#2250) 2 年之前
  Alex Klinkhamer b7647436cc llama : fix t_start_sample_us initialization warning (#2238) 2 年之前
  Xiao-Yong Jin 6e7cca4047 llama : add custom RoPE (#2054) 2 年之前
  Bach Le 7513b7b0a1 llama : add functions that work directly on model (#2197) 2 年之前
  Bach Le c9c74b4e3f llama : add classifier-free guidance (#2135) 2 年之前
  LostRuins bbef28218f Possible solution to allow K-quants on models with n_vocab!=32000 (#2148) 2 年之前
  Evan Miller 5656d10599 mpi : add support for distributed inference via MPI (#2099) 2 年之前
  oobabooga 1d16309969 llama : remove "first token must be BOS" restriction (#2153) 2 年之前
  Qingyou Meng 1d656d6360 ggml : change ggml_graph_compute() API to not require context (#1999) 2 年之前
  Tobias Lütke 31cfbb1013 Expose generation timings from server & update completions.js (#2116) 2 年之前
  Stephan Walter 1b107b8550 ggml : generalize `quantize_fns` for simpler FP16 handling (#1237) 2 年之前
  Howard Su 051c70dcd5 llama: Don't double count the sampling time (#2107) 2 年之前
  Johannes Gäßler 9e4475f5cf Fixed OpenCL offloading prints (#2082) 2 年之前