Georgi Gerganov
|
1926d6e39d
llama : adjust default context size + print warnings (#10136)
|
1 year ago |
Georgi Gerganov
|
8d8ff71536
llama : remove Tail-Free sampling (#10071)
|
1 year ago |
wwoodsTM
|
ff252ea48e
llama : add DRY sampler (#9702)
|
1 year ago |
Michael Podvitskiy
|
d80fb71f8b
llama: string_split fix (#10022)
|
1 year ago |
Daniel Bevenius
|
674804a996
arg : fix typo in embeddings argument help [no ci] (#9994)
|
1 year ago |
Georgi Gerganov
|
755a9b2bf0
llama : add infill sampler (#9896)
|
1 year ago |
MaggotHATE
|
fbc98b748e
sampling : add XTC sampler (#9742)
|
1 year ago |
Georgi Gerganov
|
c7181bd294
server : reuse cached context chunks (#9866)
|
1 year ago |
Georgi Gerganov
|
95c76e8e92
server : remove legacy system_prompt feature (#9857)
|
1 year ago |
Georgi Gerganov
|
11ac9800af
llama : improve infill support and special token detection (#9798)
|
1 year ago |
Diego Devesa
|
7eee341bee
common : use common_ prefix for common library functions (#9805)
|
1 year ago |
Xuan Son Nguyen
|
458367a906
server : better security control for public deployments (#9776)
|
1 year ago |
Georgi Gerganov
|
f4d2b8846a
llama : add reranking support (#9510)
|
1 year ago |
Vinesh Janarthanan
|
441b72b91f
main : option to disable context shift (#9484)
|
1 year ago |
Georgi Gerganov
|
6262d13e0b
common : reimplement logging (#9418)
|
1 year ago |
Georgi Gerganov
|
0abc6a2c25
llama : llama_perf + option to disable timings during decode (#9355)
|
1 year ago |
Xuan Son Nguyen
|
bfe76d4a17
common : move arg parser code to `arg.cpp` (#9388)
|
1 year ago |
Xuan Son Nguyen
|
3f7ccfd649
common : bring back missing args, add env var duplication check (#9375)
|
1 year ago |
Xuan Son Nguyen
|
1b9ae5189c
common : refactor arg parser (#9308)
|
1 year ago |
Georgi Gerganov
|
df270ef745
llama : refactor sampling v2 (#9294)
|
1 year ago |
Aarni Koskela
|
815b1fb20a
batched-bench : add `--output-format jsonl` option (#9293)
|
1 year ago |
Faisal Zaghloul
|
42c76d1358
Threadpool: take 2 (#8672)
|
1 year ago |
Xuan Son Nguyen
|
fc54ef0d1c
server : support reading arguments from environment variables (#9105)
|
1 year ago |
Zhenwei Jin
|
4af8420afb
common : remove duplicate function llama_should_add_bos_token (#8778)
|
1 year ago |
Georgi Gerganov
|
45a55b91aa
llama : better replace_all (cont) (#8926)
|
1 year ago |
Xuan Son Nguyen
|
1e6f6554aa
server : add lora hotswap endpoint (WIP) (#8857)
|
1 year ago |
Liu Jia
|
0a4ce78681
common : Changed tuple to struct (TODO fix) (#8823)
|
1 year ago |
Xuan Son Nguyen
|
de280085e7
examples : Fix `llama-export-lora` example (#8607)
|
1 year ago |
Derrick T. Woolworth
|
86e7299ef5
added support for Authorization Bearer tokens when downloading model (#8307)
|
1 year ago |
jaime-m-p
|
213701b51a
Detokenizer fixes (#8039)
|
1 year ago |