cturan/llama.cpp

Author	SHA1 Message	Date
Marcus Dunn	5be6c803fa llama : remove token functions with `context` args in favor of `model` (#3720)	2 years ago
goerch	9e70cc0322 Add test for MPT tokenization (#3728)	2 years ago
Kerfuffle	a5e7dbd614 llama : validate special token ids are in range when loading GGUF model (#3635)	2 years ago
Georgi Gerganov	d1031cf49c sampling : refactor init to use llama_sampling_params (#3696)	2 years ago
Herman Semenov	f439e506e8 ggml : fix rope + llama minor optimizations (#3560)	2 years ago
Georgi Gerganov	0e89203b51 speculative : add tree-based sampling example (#3624)	2 years ago
slaren	cb33f43a2a fix embeddings when using CUDA (#3657)	2 years ago
Georgi Gerganov	e1675d133c llama : avoid fprintf in favor of LLAMA_LOG (#3538)	2 years ago
staviq	1a159553f9 tokenizer : special token handling (#3538)	2 years ago
cebtenzzre	11bff29045 MPT : support GQA for replit-code-v1.5 (#3627)	2 years ago
Daniel Bevenius	2a4bcbacea llama : remove n_threads from llama_decode_internal (#3614)	2 years ago
goerch	233fc1c69f Minor improvements in GPT2 tokenizer (#3567)	2 years ago
Xingchen Song(宋星辰)	02d2875def llm : add bloom models (#3553)	2 years ago
Jan Ploski	f5f9121de1 llm : add MPT support (#3417)	2 years ago
Georgi Gerganov	fcca0a7004 refact : fix convert script + zero out KV cache to avoid nans (#3523)	2 years ago
Georgi Gerganov	db3abcc114 sync : ggml (ggml-backend) (#3548)	2 years ago
Kerfuffle	63d3b06a43 llama : fix missing break in Persimmon arch case statements (#3535)	2 years ago
cebtenzzre	f1782c68de quantize : fail fast on write errors (#3521)	2 years ago
Phillip Kravtsov	0e797c2fc5 llm : support Adept Persimmon 8B (#3410)	2 years ago
goerch	3a716b4dae Fix for #3454 (#3455)	2 years ago
Kerfuffle	9ca79d5cbb kv cache slot search improvements (#3493)	2 years ago
pudepiedj	a8777ad84e parallel : add option to load external prompt file (#3416)	2 years ago
l3utterfly	16820a5a0d llama : correct hparams comparison (#3446)	2 years ago
ds5t5	f8c90cdbaa llm : add Refact model (#3329)	2 years ago
Georgi Gerganov	ac2219fef3 llama : fix session saving/loading (#3400)	2 years ago
Alex Klinkhamer	48be797ffb llama : expose model's rope_freq_scale in the API (#3418)	2 years ago
goerch	ff5a3f0c09 Work on the BPE tokenizer (#3252)	2 years ago
Adrian	a847676984 metal : set log callback before initializing (#3427)	2 years ago
vvhg1	c97f01c362 infill : add new example + extend server API (#3296)	2 years ago
Cebtenzzre	2777a84be4 llama : quantize up to 31% faster on Linux and Windows with mmap (#3206)	2 years ago

Newer Older

Commit History Find

Commit History