Commit History

Author SHA1 Message Date
  slaren 5bf3953d7e cuda : improve cuda pool efficiency using virtual memory (#4606) 2 years ago
  LeonEricsson 7082d24cec lookup : add prompt lookup decoding example (#4484) 2 years ago
  FantasyGmm a55876955b cuda : fix jetson compile error (#4560) 2 years ago
  Michael Kesper 28cb35a0ec make : add LLAMA_HIP_UMA option (#4587) 2 years ago
  Georgi Gerganov 32259b2dad gguf : simplify example dependencies 2 years ago
  slaren d232aca5a7 llama : initial ggml-backend integration (#4520) 2 years ago
  Matheus Gabriel Alves Silva 919c40660f build : Check the ROCm installation location (#4485) 2 years ago
  Jared Van Bortel 70f806b821 build : detect host compiler and cuda compiler separately (#4414) 2 years ago
  slaren 799a1cb13b llama : add Mixtral support (#4406) 2 years ago
  Jared Van Bortel 6138963fb2 build : target Windows 8 for standard mingw-w64 (#4405) 2 years ago
  Georgi Gerganov fe680e3d10 sync : ggml (new ops, tests, backend, etc.) (#4359) 2 years ago
  Jared Van Bortel 511f52c334 build : enable libstdc++ assertions for debug builds (#4275) 2 years ago
  WillCorticesAI d2809a3ba2 make : fix Apple clang determination bug (#4272) 2 years ago
  Jared Van Bortel 15f5d96037 build : fix build info generation and cleanup Makefile (#3920) 2 years ago
  Georgi Gerganov 922754a8d6 lookahead : add example for lookahead decoding (#4207) 2 years ago
  Kerfuffle 28a2e6e7d4 tokenize example: Respect normal add BOS token behavior (#4126) 2 years ago
  Roger Meier 8e9361089d build : support ppc64le build for make and CMake (#3963) 2 years ago
  Michael Potter 6bb4908a17 Fix MacOS Sonoma model quantization (#4052) 2 years ago
  Georgi Gerganov 413503d4b9 make : do not add linker flags when compiling static llava lib (#3977) 2 years ago
  Damian Stewart 381efbf480 llava : expose as a shared library for downstream projects (#3613) 2 years ago
  cebtenzzre b12fa0d1c1 build : link against build info instead of compiling against it (#3879) 2 years ago
  cebtenzzre 2046eb4345 make : remove unnecessary dependency on build-info.h (#3842) 2 years ago
  Georgi Gerganov d69d777c02 ggml : quantization refactoring (#3833) 2 years ago
  Georgi Gerganov 2f9ec7e271 cuda : improve text-generation and batched decoding performance (#3776) 2 years ago
  Georgi Gerganov e3932593d4 Revert "make : add optional CUDA_NATIVE_ARCH (#2482)" 2 years ago
  Alex 96981f37b1 make : add optional CUDA_NATIVE_ARCH (#2482) 2 years ago
  Georgi Gerganov 438c2ca830 server : parallel decoding and multimodal (#3677) 2 years ago
  Georgi Gerganov d1031cf49c sampling : refactor init to use llama_sampling_params (#3696) 2 years ago
  Georgi Gerganov 0e89203b51 speculative : add tree-based sampling example (#3624) 2 years ago
  M. Yusuf Sarıgöz 370359e5ba examples: support LLaVA v1.5 (multimodal model) (#3436) 2 years ago