Zay
|
e790eef21c
llama.swiftui : update models layout (#4826)
|
2 years ago |
Georgi Gerganov
|
5537d9d36b
gitignore : imatrix
|
2 years ago |
Johannes Gäßler
|
1b280c9fff
CUDA: fix softmax compile for old CUDA versions (#4862)
|
2 years ago |
Georgi Gerganov
|
3cabe80630
llama : fix typo "imp_embd" -> "inp_embd"
|
2 years ago |
howlger
|
4315a94366
common : streamline the formatting of help (#4890)
|
2 years ago |
Georgi Gerganov
|
2d00741e12
py : fix lint (#4889)
|
2 years ago |
Georgi Gerganov
|
f445c0e68c
llama : fix llm_build_k_shift to use correct n_rot (#4889)
|
2 years ago |
Kawrakow
|
326b418b59
Importance Matrix calculation (#4861)
|
2 years ago |
Georgi Gerganov
|
1d118386fe
server : fix infill when prompt is empty (#4833)
|
2 years ago |
Georgi Gerganov
|
7edefbd79c
main : better name for variable n_print (#4874)
|
2 years ago |
Georgi Gerganov
|
3ca63b4538
main : disable token count by default (#4874)
|
2 years ago |
Georgi Gerganov
|
b037787548
swift : track ggml release branch (#4867)
|
2 years ago |
Kawrakow
|
469e75d0a3
llama : restore intended k-quants mixes for MoE models (#4872)
|
2 years ago |
Kawrakow
|
49662cbed3
ggml : SOTA 2-bit quants (add IQ2_XS) (#4856)
|
2 years ago |
Georgi Gerganov
|
3ba5b8ca8e
swift : pin ggml commit + remove ggml.h from spm-headers (#4878)
|
2 years ago |
Laura
|
4330bd83fe
server : implement credentialed CORS (#4514)
|
2 years ago |
Michael Coppola
|
27379455c3
server : support for multiple api keys (#4864)
|
2 years ago |
Behnam M
|
eab6795006
server : add `LOG_INFO` when model is successfully loaded (#4881)
|
2 years ago |
Someone
|
d8d90aa343
ci: nix-flake-update: new token with pr permissions (#4879)
|
2 years ago |
pudepiedj
|
43f76bf1c3
main : print total token count and tokens consumed so far (#4874)
|
2 years ago |
Isaac McFadyen
|
2f043328e3
server : fix typo in model name (#4876)
|
2 years ago |
Paul Tsochantaris
|
2a7c94db5f
metal : put encoder debug group behind a define (#4873)
|
2 years ago |
Georgi Gerganov
|
64802ec00d
sync : ggml
|
2 years ago |
Georgi Gerganov
|
3267c2abc7
metal : fix deprecation warning (ggml/690)
|
2 years ago |
Timothy Cronin
|
f85a973aa1
ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693)
|
2 years ago |
Jack Mousseau
|
5362e43962
metal : wrap each operation in debug group (ggml/690)
|
2 years ago |
leejet
|
e739de7909
ggml : change GGML_MAX_NAME at compile time (ggml/682)
|
2 years ago |
Halalaluyafail3
|
c910e3c28a
Fix execlp call (ggml/689)
|
2 years ago |
Erik Scholz
|
f34432ca1e
fix : cuda order of synchronization when setting a buffer (ggml/679)
|
2 years ago |
Behnam M
|
7a9f75c38b
server : update readme to document the new `/health` endpoint (#4866)
|
2 years ago |