cturan/llama.cpp

Author	SHA1 Message	Date
slaren	e85bb1a8e7 llama : add functions to get the model's metadata (#4013)	2 years ago
Georgi Gerganov	4f447a4833 llama : fix data units (#4101)	2 years ago
Kerfuffle	91f6499393 Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040)	2 years ago
Jared Van Bortel	a6fc554e26 llama : restore prefix space in llama tokenizer (#4081)	2 years ago
Galunid	36eed0c42c stablelm : StableLM support (#3586)	2 years ago
Georgi Gerganov	4760e7cc0b sync : ggml (backend v2) (#3912)	2 years ago
Kerfuffle	bb50a792ec Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041)	2 years ago
Galunid	df9d1293de Unbreak persimmon after #3837 (#4010)	2 years ago
Meng Zhang	46876d2a2c cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)	2 years ago
Meng Zhang	3d48f42efc llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)	2 years ago
cebtenzzre	3fdbe6b66b llama : change yarn_ext_factor placeholder to -1 (#3922)	2 years ago
Georgi Gerganov	1efae9b7dc llm : prevent from 1-D tensors being GPU split (#3697)	2 years ago
cebtenzzre	0eb332a10f llama : fix llama_context_default_params after #2268 (#3893)	2 years ago
cebtenzzre	898aeca90a llama : implement YaRN RoPE scaling (#2268)	2 years ago
Georgi Gerganov	c43c2da8af llm : fix llm_build_kqv taking unused tensor (benign, #3837)	2 years ago
Georgi Gerganov	523e49b111 llm : fix falcon norm after refactoring (#3837)	2 years ago
Georgi Gerganov	50337961a6 llm : add llm_build_context (#3881)	2 years ago
Andrew Godfrey	73bdcb395e finetune : add -ngl parameter (#3762)	2 years ago
Georgi Gerganov	71e3718abd llama : refactor graph build code (#3837)	2 years ago
kalomaze	238657db23 samplers : Min-P sampler implementation [alternative to Top P/Top K] (#3841)	2 years ago
Georgi Gerganov	207b51900e ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861)	2 years ago
Kerfuffle	6e08281e58 Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843)	2 years ago
Georgi Gerganov	71a09da301 llama : fix kv shift bug (#3835)	2 years ago
Georgi Gerganov	d69d777c02 ggml : quantization refactoring (#3833)	2 years ago
Kerfuffle	bd6d9e2059 llama : allow quantizing k-quants to fall back when tensor size incompatible (#3747)	2 years ago
Georgi Gerganov	fdee152e4e starcoder : add GPU offloading (#3827)	2 years ago
cebtenzzre	6d459cbfbe llama : correctly report GGUFv3 format (#3818)	2 years ago
Georgi Gerganov	2f9ec7e271 cuda : improve text-generation and batched decoding performance (#3776)	2 years ago
Marcus Dunn	5be6c803fa llama : remove token functions with `context` args in favor of `model` (#3720)	2 years ago
goerch	9e70cc0322 Add test for MPT tokenization (#3728)	2 years ago

Newer Older

Commit History Find

Commit History