cturan/llama.cpp

Author	SHA1 Message	Date
David Friehs	df845cc982 llama : minimize size used for state save/load (#4820)	2 years ago
Georgi Gerganov	15ebe59210 convert : update phi-2 to latest HF repo (#4903)	2 years ago
slaren	e7e4df031b llama : ggml-backend integration (#4766)	2 years ago
Georgi Gerganov	584d674be6 llama : remove redundant assert for StableLM (#4901)	2 years ago
Georgi Gerganov	3cabe80630 llama : fix typo "imp_embd" -> "inp_embd"	2 years ago
Georgi Gerganov	f445c0e68c llama : fix llm_build_k_shift to use correct n_rot (#4889)	2 years ago
Kawrakow	469e75d0a3 llama : restore intended k-quants mixes for MoE models (#4872)	2 years ago
Kawrakow	49662cbed3 ggml : SOTA 2-bit quants (add IQ2_XS) (#4856)	2 years ago
pudepiedj	43f76bf1c3 main : print total token count and tokens consumed so far (#4874)	2 years ago
Brian	57d016ba2d llama : add additional suffixes for model params (#4834)	2 years ago
Austin	329ff61569 llama : recognize 1B phi models (#4847)	2 years ago
Kawrakow	dd5ae06405 SOTA 2-bit quants (#4773)	2 years ago
Georgi Gerganov	b0034d93ce examples : add passkey test (#3856)	2 years ago
Georgi Gerganov	9dede37d81 llama : remove unused vars (#4796)	2 years ago
Georgi Gerganov	3c36213df8 llama : remove redundant GQA check (#4796)	2 years ago
Georgi Gerganov	d117d4dc5d llama : print tensor meta for debugging	2 years ago
Georgi Gerganov	540938f890 llama : llama_model_desc print number of experts	2 years ago
Marcus Dunn	0040d42eeb llama : replace all API facing `int`'s with `int32_t` (#4577)	2 years ago
postmasters	83e633c27e llama : differentiate the KV dims in the attention (#4657)	2 years ago
automaticcat	24a447e20a ggml : add ggml_cpu_has_avx_vnni() (#4589)	2 years ago
manikbhandari	ea5497df5d gpt2 : Add gpt2 architecture integration (#4555)	2 years ago
Nam D. Tran	f6793491b5 llama : add AWQ for llama, llama2, mpt, and mistral models (#4593)	2 years ago
slaren	dc68f0054c cuda : fix vmm pool with multi GPU (#4620)	2 years ago
Shintarou Okada	753be377b6 llama : add PLaMo model (#3557)	2 years ago
slaren	5bf3953d7e cuda : improve cuda pool efficiency using virtual memory (#4606)	2 years ago
slaren	708e179e85 fallback to CPU buffer if host buffer alloc fails (#4610)	2 years ago
slaren	48b7ff193e llama : fix platforms without mmap (#4578)	2 years ago
crasm	c7e9701f86 llama : add ability to cancel model loading (#4462)	2 years ago
Georgi Gerganov	afefa319f1 ggml : change ggml_scale to take a float instead of tensor (#4573)	2 years ago
slaren	d232aca5a7 llama : initial ggml-backend integration (#4520)	2 years ago

Newer Older

Commit History Find

Commit History