Commit History

Author SHA1 Message Date
  Pierrick Hymbert 4bd0f93e4a model: support arch `DbrxForCausalLM` (#6515) 1 year ago
  jiez 91c736015b llama : add gguf_remove_key + remove split meta during quantize (#6591) 1 year ago
  MasterYi1024 dee7f8d692 Correct free memory and total memory. (#6630) 1 year ago
  Clint Herron 04a5ac211e Optimization: eliminate addition of redundant stacks when advancing grammar. (#6616) 1 year ago
  Olivier Chafik cbaadc9294 grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609) 1 year ago
  Pierrick Hymbert b804b1ef77 eval-callback: Example how to use eval callback for debugging (#6576) 1 year ago
  slaren 4f407a0a35 llama : add model types for mixtral (#6589) 1 year ago
  Jared Van Bortel 1b67731e18 BERT tokenizer fixes (#6498) 1 year ago
  Carolinabanana 5dc9dd7152 llama : add Command R Plus support (#6491) 1 year ago
  Georgi Gerganov cc4a95426d llama : fix attention layer count sanity check (#6550) 1 year ago
  Georgi Gerganov b73e564b16 quantize : fix precedence of cli args (#6541) 1 year ago
  Rick G e3c337d87c llama : support negative ith in llama_get_ API (#6519) 1 year ago
  Jan Boon beea6e1b16 llama : save and restore kv cache for single seq id (#6341) 1 year ago
  Brian a8bd14d557 gguf.py : add licence and version to gguf writer (#6504) 1 year ago
  Clint Herron 9b84ae1806 examples : add GBNF validator program (#5948) 1 year ago
  bryanSwk bb43cf7e9d llama : add SEA-LION support (#6448) 1 year ago
  kaizau 1ff4d9f3d6 Add OpenChat, Alpaca, Vicuna chat templates (#6397) 1 year ago
  slaren 08a0c02060 ggml : mul_mat_id use the same tensor for all the experts (#6387) 1 year ago
  0cc4m ba0c7c70ab Vulkan k-quant mmq and ggml-backend offload functionality (#6155) 1 year ago
  hxer7963 069574775c [Model] Add support for xverse (#6301) 1 year ago
  Daniel Bevenius 057400a3fd llama : remove redundant reshape in build_kv_store (#6369) 1 year ago
  compilade 0308f5e3d7 llama : fix command-r inference when omitting outputs (#6367) 1 year ago
  Jared Van Bortel 32c8486e1f wpm : portable unicode tolower (#6305) 1 year ago
  compilade 557410b8f0 llama : greatly reduce output buffer memory usage (#6122) 1 year ago
  Kawrakow 55c1b2a3bb IQ1_M: 1.75 bpw quantization (#6302) 1 year ago
  Kawrakow d25b1c31b0 quantize : be able to override metadata by key (#6321) 1 year ago
  slaren 280345968d cuda : rename build flag to LLAMA_CUDA (#6299) 1 year ago
  Meng, Hengyu ddf6568510 [SYCL] offload op (#6217) 1 year ago
  Jared Van Bortel 94d1b3b411 use _wfopen instead of fopen on Windows (#6248) 1 year ago
  Pierrick Hymbert f482bb2e49 common: llama_load_model_from_url split support (#6192) 1 year ago