Kawrakow
|
469e75d0a3
llama : restore intended k-quants mixes for MoE models (#4872)
|
2 anos atrás |
Kawrakow
|
49662cbed3
ggml : SOTA 2-bit quants (add IQ2_XS) (#4856)
|
2 anos atrás |
Georgi Gerganov
|
3ba5b8ca8e
swift : pin ggml commit + remove ggml.h from spm-headers (#4878)
|
2 anos atrás |
Laura
|
4330bd83fe
server : implement credentialed CORS (#4514)
|
2 anos atrás |
Michael Coppola
|
27379455c3
server : support for multiple api keys (#4864)
|
2 anos atrás |
Behnam M
|
eab6795006
server : add `LOG_INFO` when model is successfully loaded (#4881)
|
2 anos atrás |
Someone
|
d8d90aa343
ci: nix-flake-update: new token with pr permissions (#4879)
|
2 anos atrás |
pudepiedj
|
43f76bf1c3
main : print total token count and tokens consumed so far (#4874)
|
2 anos atrás |
Isaac McFadyen
|
2f043328e3
server : fix typo in model name (#4876)
|
2 anos atrás |
Paul Tsochantaris
|
2a7c94db5f
metal : put encoder debug group behind a define (#4873)
|
2 anos atrás |
Georgi Gerganov
|
64802ec00d
sync : ggml
|
2 anos atrás |
Georgi Gerganov
|
3267c2abc7
metal : fix deprecation warning (ggml/690)
|
2 anos atrás |
Timothy Cronin
|
f85a973aa1
ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693)
|
2 anos atrás |
Jack Mousseau
|
5362e43962
metal : wrap each operation in debug group (ggml/690)
|
2 anos atrás |
leejet
|
e739de7909
ggml : change GGML_MAX_NAME at compile time (ggml/682)
|
2 anos atrás |
Halalaluyafail3
|
c910e3c28a
Fix execlp call (ggml/689)
|
2 anos atrás |
Erik Scholz
|
f34432ca1e
fix : cuda order of synchronization when setting a buffer (ggml/679)
|
2 anos atrás |
Behnam M
|
7a9f75c38b
server : update readme to document the new `/health` endpoint (#4866)
|
2 anos atrás |
Georgi Gerganov
|
5c1980d8d4
server : fix build + rename enums (#4870)
|
2 anos atrás |
Behnam M
|
cd108e641d
server : add a `/health` endpoint (#4860)
|
2 anos atrás |
Brian
|
57d016ba2d
llama : add additional suffixes for model params (#4834)
|
2 anos atrás |
Austin
|
329ff61569
llama : recognize 1B phi models (#4847)
|
2 anos atrás |
John
|
d34633d8db
clip : support more quantization types (#4846)
|
2 anos atrás |
Johannes Gäßler
|
4f56458d34
Python script to compare commits with llama-bench (#4844)
|
2 anos atrás |
Austin
|
6efb8eb30e
convert.py : fix vanilla LLaMA model conversion (#4818)
|
2 anos atrás |
Justine Tunney
|
36e5a08b20
llava-cli : don't crash if --image flag is invalid (#4835)
|
2 anos atrás |
Georgi Gerganov
|
4dccb38d9a
metal : improve dequantize precision to match CPU (#4836)
|
2 anos atrás |
Georgi Gerganov
|
9a818f7c42
scripts : improve get-pg.sh (#4838)
|
2 anos atrás |
iohub
|
18adb4e9bb
readme : add 3rd party collama reference to UI list (#4840)
|
2 anos atrás |
Georgi Gerganov
|
d9653894df
scripts : script to get Paul Graham essays in txt format (#4838)
|
2 anos atrás |