Commit History

Author SHA1 Message Date
  Joan Fontanals f5d7b268ec llama : add jina v2 base code (#7596) 1 year ago
  Georgi Gerganov 2b3389677a ggml : refactor rope norm/neox (#7634) 1 year ago
  Georgi Gerganov 1442677f92 common : refactor cli arg parsing (#7675) 1 year ago
  Georgi Gerganov 554c247caf ggml : remove OpenCL (#7735) 1 year ago
  Georgi Gerganov 0cd6bd3483 llama : remove beam search (#7736) 1 year ago
  jaime-m-p 3b38d48609 Per token attributes (#7685) 1 year ago
  Radoslav Gerganov bde7cd3cd9 llama : offload to RPC in addition to other backends (#7640) 1 year ago
  0cc4m 3d7ebf6312 Vulkan Mixture of Experts (MoE) support (#7628) 1 year ago
  zhangkaihuo 6f28a333c1 llama : MiniCPM support tied embeddings (#7664) 1 year ago
  Georgi Gerganov 549279d804 llama : avoid double token-to-piece cache (#7654) 1 year ago
  Johannes Gäßler 9b596417af CUDA: quantized KV support for FA vec (#7527) 1 year ago
  Georgi Gerganov 5921b8f089 llama : cache llama_token_to_piece (#7587) 1 year ago
  Georgi Gerganov fb76ec31a9 ggml : fix YARN + add tests + add asserts (#7617) 1 year ago
  jaime-m-p 02c1ecad07 Tokenizer WPM fixes (#7500) 1 year ago
  Giuseppe Scrivano 5442939fcc llama : support small Granite models (#7481) 1 year ago
  fairydreaming ee3dff6b8e Add support for DeepseekV2ForCausalLM (#7519) 1 year ago
  Georgi Gerganov 8b99e2aa66 llama : handle unknown utf8 bytes (#7588) 1 year ago
  Bartowski c429b33beb llama : add Smaug 70B support (#7402) 1 year ago
  Justine Tunney 00c6390793 main : don't print special tokens with --grammar (#6923) 1 year ago
  Masaya, Kato faa0e6979a ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (#7433) 1 year ago
  fairydreaming fbca2f27fc Add support for ArcticForCausalLM (#7020) 1 year ago
  Tristan Druyen 007489e895 Fix phi3 chat template confusion with zephyr (#7449) 1 year ago
  Daniel Bevenius 3015851c5a llama : add getters for n_threads/n_threads_batch (#7464) 1 year ago
  Georgi Gerganov 55ac3b7aea ci : use Pythia models instead of OpenLlama (#7470) 1 year ago
  fairydreaming 9b82476ee9 Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models) (#7461) 1 year ago
  Georgi Gerganov a61a94e543 llama : rename n_ctx -> cache.size, less confusing (#0) 1 year ago
  Georgi Gerganov e84b71c2c6 ggml : drop support for QK_K=64 (#7473) 1 year ago
  slaren b18532a4ef phi3 : duplicate rope factors in each layer (#7447) 1 year ago
  Justine Tunney 03d8900ebe llama : add missing model type names (#7445) 1 year ago
  liuwei-git 201cc11afa llama : add phi3 128K model support (#7225) 1 year ago