Marcus Dunn
|
5be6c803fa
llama : remove token functions with `context` args in favor of `model` (#3720)
|
2 years ago |
goerch
|
9e70cc0322
Add test for MPT tokenization (#3728)
|
2 years ago |
Kerfuffle
|
a5e7dbd614
llama : validate special token ids are in range when loading GGUF model (#3635)
|
2 years ago |
Georgi Gerganov
|
d1031cf49c
sampling : refactor init to use llama_sampling_params (#3696)
|
2 years ago |
Herman Semenov
|
f439e506e8
ggml : fix rope + llama minor optimizations (#3560)
|
2 years ago |
Georgi Gerganov
|
0e89203b51
speculative : add tree-based sampling example (#3624)
|
2 years ago |
slaren
|
cb33f43a2a
fix embeddings when using CUDA (#3657)
|
2 years ago |
Georgi Gerganov
|
e1675d133c
llama : avoid fprintf in favor of LLAMA_LOG (#3538)
|
2 years ago |
staviq
|
1a159553f9
tokenizer : special token handling (#3538)
|
2 years ago |
cebtenzzre
|
11bff29045
MPT : support GQA for replit-code-v1.5 (#3627)
|
2 years ago |
Daniel Bevenius
|
2a4bcbacea
llama : remove n_threads from llama_decode_internal (#3614)
|
2 years ago |
goerch
|
233fc1c69f
Minor improvements in GPT2 tokenizer (#3567)
|
2 years ago |
Xingchen Song(宋星辰)
|
02d2875def
llm : add bloom models (#3553)
|
2 years ago |
Jan Ploski
|
f5f9121de1
llm : add MPT support (#3417)
|
2 years ago |
Georgi Gerganov
|
fcca0a7004
refact : fix convert script + zero out KV cache to avoid nans (#3523)
|
2 years ago |
Georgi Gerganov
|
db3abcc114
sync : ggml (ggml-backend) (#3548)
|
2 years ago |
Kerfuffle
|
63d3b06a43
llama : fix missing break in Persimmon arch case statements (#3535)
|
2 years ago |
cebtenzzre
|
f1782c68de
quantize : fail fast on write errors (#3521)
|
2 years ago |
Phillip Kravtsov
|
0e797c2fc5
llm : support Adept Persimmon 8B (#3410)
|
2 years ago |
goerch
|
3a716b4dae
Fix for #3454 (#3455)
|
2 years ago |
Kerfuffle
|
9ca79d5cbb
kv cache slot search improvements (#3493)
|
2 years ago |
pudepiedj
|
a8777ad84e
parallel : add option to load external prompt file (#3416)
|
2 years ago |
l3utterfly
|
16820a5a0d
llama : correct hparams comparison (#3446)
|
2 years ago |
ds5t5
|
f8c90cdbaa
llm : add Refact model (#3329)
|
2 years ago |
Georgi Gerganov
|
ac2219fef3
llama : fix session saving/loading (#3400)
|
2 years ago |
Alex Klinkhamer
|
48be797ffb
llama : expose model's rope_freq_scale in the API (#3418)
|
2 years ago |
goerch
|
ff5a3f0c09
Work on the BPE tokenizer (#3252)
|
2 years ago |
Adrian
|
a847676984
metal : set log callback before initializing (#3427)
|
2 years ago |
vvhg1
|
c97f01c362
infill : add new example + extend server API (#3296)
|
2 years ago |
Cebtenzzre
|
2777a84be4
llama : quantize up to 31% faster on Linux and Windows with mmap (#3206)
|
2 years ago |