Commit History

Author SHA1 Message Date
  fairydreaming 9394bbd484 llama : Add support for DeepSeek V3 (#11049) 1 year ago
  matt23654 f922a9c542 [GGML][RPC] Support for models with non-512-aligned tensors over RPC. (#11047) 1 year ago
  DAN™ 46be942214 llama : add support for the cohere2 model architecture (#10900) 1 year ago
  Georgi Gerganov 78c6785175 sync : ggml 1 year ago
  Georgi Gerganov 5e3b08d606 ggml : do not install metal source when embed library (ggml/1054) 1 year ago
  Daniel Bevenius db68c93b57 ggml : improve inputs log sched_print_assignments (ggml/1053) 1 year ago
  Gilad S. c31fc8b966 fix: Vulkan shader gen binary path (#11037) 1 year ago
  Molly Sophia 4b0c638b9a common : disable KV cache shifting automatically for unsupported models (#11053) 1 year ago
  Georgi Gerganov e7da954ecc metal : avoid uint (#11019) 1 year ago
  Georgi Gerganov f66f582927 llama : refactor `src/llama.cpp` (#10902) 1 year ago
  Pierrick Hymbert 2f0ee84b9b server: bench: minor fixes (#10765) 1 year ago
  Xuan Son Nguyen 0da5d86026 server : allow using LoRA adapters per-request (#10994) 1 year ago
  Benson Wong a45433ba20 readme : add llama-swap to infrastructure section (#11032) 1 year ago
  Srihari-mcw 0827b2c1da ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027) 1 year ago
  Xuan Son Nguyen 45095a61bf server : clean up built-in template detection (#11026) 1 year ago
  Xuan Son Nguyen 5896c65232 server : add OAI compat for /v1/completions (#10974) 1 year ago
  ymcki bc7b1f8632 convert : fix Llama-3_1-Nemotron-51B rope settings (#11008) 1 year ago
  Peter 6e1531aca5 common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013) 1 year ago
  Jeff Bolz 716bd6dec3 vulkan: optimize mul_mat for small values of N (#10991) 1 year ago
  ag2s20150909 c250ecb315 android : fix llama_batch free (#11014) 1 year ago
  Jeff Bolz a813badbbd vulkan: im2col and matmul optimizations for stable diffusion (#10942) 1 year ago
  Jeff Bolz fdd2188912 vulkan: Use push constant offset to handle misaligned descriptors (#10987) 1 year ago
  Isaac McFadyen f865ea149d server: added more docs for response_fields field (#10995) 1 year ago
  Alexey Parfenov 16cdce7b68 server : fix token duplication when streaming with stop strings (#10997) 1 year ago
  Eve d79d8f39b4 vulkan: multi-row k quants (#10846) 1 year ago
  Peter d283d02bf2 examples, ggml : fix GCC compiler warnings (#10983) 1 year ago
  Reza Kakhki 9ba399dfa7 server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967) 1 year ago
  Djip007 2cd43f4900 ggml : more perfo with llamafile tinyblas on x86_64 (#10714) 1 year ago
  NeverLucky 09fe2e7613 server: allow filtering llama server response fields (#10940) 1 year ago
  Georgi Gerganov 30caac3a68 llama : the WPM vocabs use the CLS token as BOS (#10930) 1 year ago