Cronologia Commit

Autore SHA1 Messaggio Data
  Daniel Bevenius a6e514a85f llava: fix typo/formatting in README.md (#5405) 1 anno fa
  Johannes Gäßler 26d4efd11e sampling: fix top_k <= 0 (#5388) 1 anno fa
  Georgi Gerganov 8504d2d0da tests : .gitignore obj files 1 anno fa
  Michael Podvitskiy c4fbb6717c CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393) 1 anno fa
  Ebey Abraham 8c933b70c2 fix typo in readme (#5399) 1 anno fa
  Kamil Tomšík b906596bb7 Add Ava in the list of llama.cpp UIs (#4362) 1 anno fa
  Johannes Gäßler aa7ab99be2 CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386) 1 anno fa
  Neo Zhang Jianyu 10afa6f1d1 [SYCL] update install make by w64devkit (#5297) 1 anno fa
  Xiao-Yong Jin 0ef46da632 llava-cli : always tokenize special tokens (#5382) 1 anno fa
  0cc4m ee1628bdfe Basic Vulkan Multi-GPU implementation (#5321) 1 anno fa
  Eve ed0bf32290 readme : modernize (#5379) 1 anno fa
  Ben Williams 9a697d842b readme : update ui list (#5354) 1 anno fa
  runfuture 316c7faf77 llama : add MiniCPM support (#5346) 1 anno fa
  Justin Parker f3e2b4fa3f server : update `/props` with "total_slots" value (#5373) 1 anno fa
  Sang-Kil Park f68664ac24 convert : fix TypeError on GPT-2 vocab.json (#5288) 1 anno fa
  Alexey Parfenov 213d1439fa server : remove model.json endpoint (#5371) 1 anno fa
  Johannes Gäßler 17c97fb062 CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370) 1 anno fa
  Kawrakow b08f22c882 Update README.md (#5366) 1 anno fa
  Kawrakow f57fadc009 Slight quantization improvement for Q4_K and Q5_K (#5361) 1 anno fa
  BarfingLemurs 2e9c0bd6b3 readme : add phi, orion 14b, internlm2, and yi-VL to readme (#5362) 1 anno fa
  Johannes Gäßler 2c516611f1 CUDA: mul_mat_vec_q for batch sizes > 1 (#5351) 1 anno fa
  Justin Parker 8a79c591de server : include total "num_slots" in props endpoint (#5349) 1 anno fa
  Michael Coppola 31e7903221 server : add `dynatemp_range` and `dynatemp_exponent` (#5352) 1 anno fa
  Niall Coates 4ffc7a17d4 server : various fixes for the prompt field in /completion (#5300) 1 anno fa
  Georgi Gerganov 906cff55c2 py : handle byte tokens in `get_token_type` (#5341) 1 anno fa
  Johannes Gäßler 098f6d737b make: Use ccache for faster compilation (#5318) 1 anno fa
  Johannes Gäßler 78b00dda6c README: updated introduction (#5343) 1 anno fa
  Kawrakow c6b395535a ggml : make use of ggml-quants.h possible in C++ code (#5338) 1 anno fa
  Dr. Tom Murphy VII Ph.D abb61944a5 ggml : avoid duplicating function calls using MIN/MAX macros (#5325) 1 anno fa
  Kawrakow 89503dcb5f iq3_xxs: quards for the no-imatrix situation (#5334) 1 anno fa