Commit History

Author SHA1 Message Date
  Alex Azarov 5f5fe1bd60 metal : correctly set SIMD support flags on iOS (#4923) 2 years ago
  Karthik Kumar Viswanathan ac32902a87 llama : support WinXP build with MinGW 8.1.0 (#3419) 2 years ago
  Kawrakow 147b17ac94 2-bit quantizations (#4897) 2 years ago
  Kawrakow 807179ec58 Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906) 2 years ago
  Georgi Gerganov 76484fbfd3 sync : ggml 2 years ago
  Johannes Gäßler c71d608ce7 ggml: cache sin/cos for RoPE (#4908) 2 years ago
  Georgi Gerganov 4be5ef556d metal : remove old API (#4919) 2 years ago
  Georgi Gerganov 0ea069b87b server : fix prompt caching with system prompt (#4914) 2 years ago
  Georgi Gerganov f172de03f1 llama : fix detokenization of non-special added-tokens (#4916) 2 years ago
  Georgi Gerganov 2d57de5255 metal : disable log for loaded kernels (#4794) 2 years ago
  David Friehs df845cc982 llama : minimize size used for state save/load (#4820) 2 years ago
  Someone 6b48ed0893 workflows: unbreak nix-build-aarch64, and split it out (#4915) 2 years ago
  Yann Follet 722d33f34e main : add parameter --no-display-prompt (#4541) 2 years ago
  texmex76 c30b1ef39a gguf : fix potential infinite for-loop (#4600) 2 years ago
  Georgi Gerganov b38b5e93ae metal : refactor kernel loading code (#4794) 2 years ago
  Johannes Gäßler 7dc78764e2 compare-llama-bench: tweak output format (#4910) 2 years ago
  Ziad Ben Hadj-Alouane 356327feb3 server : fix deadlock that occurs in multi-prompt scenarios (#4905) 2 years ago
  makomk ee8243adaa server : fix crash with multimodal models without BOS token (#4904) 2 years ago
  Georgi Gerganov 15ebe59210 convert : update phi-2 to latest HF repo (#4903) 2 years ago
  Georgi Gerganov de473f5f8e sync : ggml 2 years ago
  Georgi Gerganov f238461236 ggml : fix 32-bit ARM compat for IQ2_XS (whisper/1758) 2 years ago
  slaren fa5c1fb44a backend_sched : fix assignments 2 years ago
  Maximilian Winter 52ee4540c0 examples : add pydantic models to GBNF grammar generator (#4883) 2 years ago
  Johannes Gäßler 3fe81781e3 CUDA: faster q8_0 -> f16 dequantization (#4895) 2 years ago
  slaren e7e4df031b llama : ggml-backend integration (#4766) 2 years ago
  Georgi Gerganov 584d674be6 llama : remove redundant assert for StableLM (#4901) 2 years ago
  Daniel Bevenius 930f907d3e export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894) 2 years ago
  Zay e790eef21c llama.swiftui : update models layout (#4826) 2 years ago
  Georgi Gerganov 5537d9d36b gitignore : imatrix 2 years ago
  Johannes Gäßler 1b280c9fff CUDA: fix softmax compile for old CUDA versions (#4862) 2 years ago