Diego Devesa
|
10bce0450f
llama : accept a list of devices to use to offload a model (#10497)
|
vor 1 Jahr |
Georgi Gerganov
|
d9d54e498d
speculative : refactor and add a simpler example (#10362)
|
vor 1 Jahr |
Georgi Gerganov
|
2a82891a85
speculative : fix out-of-bounds access (#10289)
|
vor 1 Jahr |
Georgi Gerganov
|
55e47786e3
llama : default sampling changes + greedy update (#9897)
|
vor 1 Jahr |
Georgi Gerganov
|
bc21975084
speculative : fix handling of some input params (#9963)
|
vor 1 Jahr |
Xuan Son Nguyen
|
cda0e4b648
llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745)
|
vor 1 Jahr |
Diego Devesa
|
7eee341bee
common : use common_ prefix for common library functions (#9805)
|
vor 1 Jahr |
Georgi Gerganov
|
b0f27361f3
sampling : avoid expensive softmax during greedy sampling (#9605)
|
vor 1 Jahr |
Georgi Gerganov
|
6262d13e0b
common : reimplement logging (#9418)
|
vor 1 Jahr |
Georgi Gerganov
|
0abc6a2c25
llama : llama_perf + option to disable timings during decode (#9355)
|
vor 1 Jahr |
Xuan Son Nguyen
|
bfe76d4a17
common : move arg parser code to `arg.cpp` (#9388)
|
vor 1 Jahr |
Xuan Son Nguyen
|
1b9ae5189c
common : refactor arg parser (#9308)
|
vor 1 Jahr |
Georgi Gerganov
|
df270ef745
llama : refactor sampling v2 (#9294)
|
vor 1 Jahr |
Faisal Zaghloul
|
42c76d1358
Threadpool: take 2 (#8672)
|
vor 1 Jahr |
Liu Jia
|
0a4ce78681
common : Changed tuple to struct (TODO fix) (#8823)
|
vor 1 Jahr |
Georgi Gerganov
|
1442677f92
common : refactor cli arg parsing (#7675)
|
vor 1 Jahr |
Pedro Cuenca
|
b97bc3966e
llama : support Llama 3 HF conversion (#6745)
|
vor 1 Jahr |
Jared Van Bortel
|
1b67731e18
BERT tokenizer fixes (#6498)
|
vor 1 Jahr |
compilade
|
557410b8f0
llama : greatly reduce output buffer memory usage (#6122)
|
vor 1 Jahr |
Minsoo Cheong
|
586e7bc561
sampling : deduplicated code for probability distribution access (#6240)
|
vor 1 Jahr |
Jeffrey Quesnelle
|
29eee40474
fix speculative decoding build on windows (#5874)
|
vor 1 Jahr |
Minsoo Cheong
|
6d341ab6c5
speculative : implement stochastic speculative sampling (#5625)
|
vor 1 Jahr |
bmwl
|
f486f6e1e5
ggml : add numa options (#5377)
|
vor 1 Jahr |
stduhpf
|
e0324285a5
speculative : threading options (#4959)
|
vor 2 Jahren |
Richard Kiss
|
9494d7c477
english : use `typos` to fix comments and logs (#4354)
|
vor 2 Jahren |
stduhpf
|
da5eaef1f3
speculative : support `--color` (#4343)
|
vor 2 Jahren |
Branden Butler
|
40a34fe8d0
speculative : fix prompt tokenization in speculative example (#4025)
|
vor 2 Jahren |
Georgi Gerganov
|
8f961abdc4
speculative : change default p_accept to 0.5 + CLI args (#3919)
|
vor 2 Jahren |
cebtenzzre
|
b12fa0d1c1
build : link against build info instead of compiling against it (#3879)
|
vor 2 Jahren |
Georgi Gerganov
|
ee1a0ec9cb
llama : add option for greedy sampling with probs (#3813)
|
vor 2 Jahren |