Georgi Gerganov
|
df270ef745
llama : refactor sampling v2 (#9294)
|
1 year ago |
Faisal Zaghloul
|
42c76d1358
Threadpool: take 2 (#8672)
|
1 year ago |
Liu Jia
|
0a4ce78681
common : Changed tuple to struct (TODO fix) (#8823)
|
1 year ago |
Georgi Gerganov
|
1442677f92
common : refactor cli arg parsing (#7675)
|
1 year ago |
Pedro Cuenca
|
b97bc3966e
llama : support Llama 3 HF conversion (#6745)
|
1 year ago |
Jared Van Bortel
|
1b67731e18
BERT tokenizer fixes (#6498)
|
1 year ago |
compilade
|
557410b8f0
llama : greatly reduce output buffer memory usage (#6122)
|
1 year ago |
Minsoo Cheong
|
586e7bc561
sampling : deduplicated code for probability distribution access (#6240)
|
1 year ago |
Jeffrey Quesnelle
|
29eee40474
fix speculative decoding build on windows (#5874)
|
1 year ago |
Minsoo Cheong
|
6d341ab6c5
speculative : implement stochastic speculative sampling (#5625)
|
1 year ago |
bmwl
|
f486f6e1e5
ggml : add numa options (#5377)
|
1 year ago |
stduhpf
|
e0324285a5
speculative : threading options (#4959)
|
2 years ago |
Richard Kiss
|
9494d7c477
english : use `typos` to fix comments and logs (#4354)
|
2 years ago |
stduhpf
|
da5eaef1f3
speculative : support `--color` (#4343)
|
2 years ago |
Branden Butler
|
40a34fe8d0
speculative : fix prompt tokenization in speculative example (#4025)
|
2 years ago |
Georgi Gerganov
|
8f961abdc4
speculative : change default p_accept to 0.5 + CLI args (#3919)
|
2 years ago |
cebtenzzre
|
b12fa0d1c1
build : link against build info instead of compiling against it (#3879)
|
2 years ago |
Georgi Gerganov
|
ee1a0ec9cb
llama : add option for greedy sampling with probs (#3813)
|
2 years ago |
Kerfuffle
|
41aee4df82
speculative : ensure draft and target model vocab matches (#3812)
|
2 years ago |
Marcus Dunn
|
5be6c803fa
llama : remove token functions with `context` args in favor of `model` (#3720)
|
2 years ago |
Georgi Gerganov
|
d1031cf49c
sampling : refactor init to use llama_sampling_params (#3696)
|
2 years ago |
Georgi Gerganov
|
4e82b2ea3f
speculative : bug fixes
|
2 years ago |
Georgi Gerganov
|
0e89203b51
speculative : add tree-based sampling example (#3624)
|
2 years ago |
Kerfuffle
|
70c29da118
common : fix mirostat state when using multiple sequences (#3543)
|
2 years ago |
Georgi Gerganov
|
ac2219fef3
llama : fix session saving/loading (#3400)
|
2 years ago |
slaren
|
16bc66d947
llama.cpp : split llama_context_params into model and context params (#3301)
|
2 years ago |
Georgi Gerganov
|
ec893798b7
llama : custom attention mask + parallel decoding + no context swaps (#3228)
|
2 years ago |
Leng Yue
|
35f73049af
speculative : add heuristic algorithm (#3006)
|
2 years ago |
FK
|
84e723653c
speculative: add --n-gpu-layers-draft option (#3063)
|
2 years ago |
Przemysław Pawełczyk
|
cb6c44c5e0
build : do not use _GNU_SOURCE gratuitously (#2035)
|
2 years ago |