Commit History

Author SHA1 Message Date
  Kawrakow bac66994cf Quantization imrovements for k_quants (#2707) 2 years ago
  slaren 1123f7fbdf ggml-cuda : use graph allocator (#2684) 2 years ago
  Georgi Gerganov 6381d4e110 gguf : new file format with flexible meta data (beta) (#2398) 2 years ago
  slaren 097e121e2f llama : add benchmark example (#2626) 2 years ago
  Evan Jones 604b8bdfa6 Fix unicode in grammars (fixes #2501) (#2553) 2 years ago
  Georgi Gerganov a73ccf1aa3 llama : replace (permute + reshape + view_1d) with (view_3d) (#2538) 2 years ago
  Shouzheng Liu fc8ef549e5 metal : enable ggml-alloc (#2627) 2 years ago
  Shouzheng Liu bf83bff674 metal : matrix-matrix multiplication kernel (#2615) 2 years ago
  Jhen-Jie Hong d783f7982e metal : return null instead of exit(1) (#2573) 2 years ago
  grahameth ea04a4ca19 add log_callback to llama_context_params for custom logging. (#2234) 2 years ago
  Johannes Gäßler acfc5478ff CUDA: tighter VRAM scratch size for 65b/70b (#2551) 2 years ago
  Johannes Gäßler 3d9a551816 Fixed mmap prefetch for GPU offloading (#2529) 2 years ago
  l3utterfly 415e99fec2 Stream save llama context data to file instead of allocating entire buffer upfront (#2488) 2 years ago
  Johannes Gäßler 0728c5a8b9 CUDA: mmq CLI option, fixed mmq build issues (#2453) 2 years ago
  slaren 9d2382b3e4 Fix Metal backend broken from the allocator changes (#2455) 2 years ago
  slaren a113689571 ggml : add graph tensor allocator (#2411) 2 years ago
  eric8607242 ee1b497c98 llama : support more diverse tokenizers? (#2420) 2 years ago
  Rand Xie 65cdf34bdc llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433) 2 years ago
  Georgi Gerganov 1a941869cb metal : disable graph concurrency optimization due to bug (#2413) 2 years ago
  slaren 5488fb789e ggml : allocate graphs in a context (#2392) 2 years ago
  Kawrakow eb542d3932 Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384) 2 years ago
  slaren da1889834a ggml : improve graph build time via hash table lookup (#2329) 2 years ago
  Shouzheng Liu 1aa18ef994 metal : concurrently dispatch commands (#2358) 2 years ago
  slaren 41c674161f make rms_norm_eps a parameter (#2374) 2 years ago
  Evan Jones 84e09a7d8b llama : add grammar-based sampling (#1773) 2 years ago
  Georgi Gerganov e76d630df1 llama : grouped-query attention + LLaMAv2 70B support (#2276) 2 years ago
  Christian Demsar a940458e48 llama : print max tensor size to stderr (#2336) 2 years ago
  Georgi Gerganov b47b8a9cfe llama : optimize memory buffers (#2325) 2 years ago
  Georgi Gerganov 513f861953 ggml : fix rope args order + assert (#2054) 2 years ago
  Guillaume "Vermeille" Sanchez ab0e26bdfb llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280) 2 years ago