cturan/llama.cpp

Author	SHA1 Message	Date
Zay	e790eef21c llama.swiftui : update models layout (#4826)	2 years ago
Georgi Gerganov	5537d9d36b gitignore : imatrix	2 years ago
Johannes Gäßler	1b280c9fff CUDA: fix softmax compile for old CUDA versions (#4862)	2 years ago
Georgi Gerganov	3cabe80630 llama : fix typo "imp_embd" -> "inp_embd"	2 years ago
howlger	4315a94366 common : streamline the formatting of help (#4890)	2 years ago
Georgi Gerganov	2d00741e12 py : fix lint (#4889)	2 years ago
Georgi Gerganov	f445c0e68c llama : fix llm_build_k_shift to use correct n_rot (#4889)	2 years ago
Kawrakow	326b418b59 Importance Matrix calculation (#4861)	2 years ago
Georgi Gerganov	1d118386fe server : fix infill when prompt is empty (#4833)	2 years ago
Georgi Gerganov	7edefbd79c main : better name for variable n_print (#4874)	2 years ago
Georgi Gerganov	3ca63b4538 main : disable token count by default (#4874)	2 years ago
Georgi Gerganov	b037787548 swift : track ggml release branch (#4867)	2 years ago
Kawrakow	469e75d0a3 llama : restore intended k-quants mixes for MoE models (#4872)	2 years ago
Kawrakow	49662cbed3 ggml : SOTA 2-bit quants (add IQ2_XS) (#4856)	2 years ago
Georgi Gerganov	3ba5b8ca8e swift : pin ggml commit + remove ggml.h from spm-headers (#4878)	2 years ago
Laura	4330bd83fe server : implement credentialed CORS (#4514)	2 years ago
Michael Coppola	27379455c3 server : support for multiple api keys (#4864)	2 years ago
Behnam M	eab6795006 server : add `LOG_INFO` when model is successfully loaded (#4881)	2 years ago
Someone	d8d90aa343 ci: nix-flake-update: new token with pr permissions (#4879)	2 years ago
pudepiedj	43f76bf1c3 main : print total token count and tokens consumed so far (#4874)	2 years ago
Isaac McFadyen	2f043328e3 server : fix typo in model name (#4876)	2 years ago
Paul Tsochantaris	2a7c94db5f metal : put encoder debug group behind a define (#4873)	2 years ago
Georgi Gerganov	64802ec00d sync : ggml	2 years ago
Georgi Gerganov	3267c2abc7 metal : fix deprecation warning (ggml/690)	2 years ago
Timothy Cronin	f85a973aa1 ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693)	2 years ago
Jack Mousseau	5362e43962 metal : wrap each operation in debug group (ggml/690)	2 years ago
leejet	e739de7909 ggml : change GGML_MAX_NAME at compile time (ggml/682)	2 years ago
Halalaluyafail3	c910e3c28a Fix execlp call (ggml/689)	2 years ago
Erik Scholz	f34432ca1e fix : cuda order of synchronization when setting a buffer (ggml/679)	2 years ago
Behnam M	7a9f75c38b server : update readme to document the new `/health` endpoint (#4866)	2 years ago

Newer Older

Commit History Find

Commit History