Georgi Gerganov
|
c2a16c0bdb
server : fix free of spec context and batch (#10651)
|
vor 1 Jahr |
Xuan Son Nguyen
|
6c5bc0625f
server : (refactoring) do not rely on JSON internally (#10643)
|
vor 1 Jahr |
Georgi Gerganov
|
1da7b76569
server : fix speculative decoding with context shift (#10641)
|
vor 1 Jahr |
Xuan Son Nguyen
|
91c36c269b
server : (web ui) Various improvements, now use vite as bundler (#10599)
|
vor 1 Jahr |
Georgi Gerganov
|
70b98fadbc
server : fix default draft model parameters (#10586)
|
vor 1 Jahr |
haopeng
|
64ed2091b2
server: Add "tokens per second" information in the backend (#10548)
|
vor 1 Jahr |
alek3y
|
86dc11c5bc
server : bind to any port when specified (#10590)
|
vor 1 Jahr |
Georgi Gerganov
|
84e1c33cde
server : fix parallel speculative decoding (#10513)
|
vor 1 Jahr |
Georgi Gerganov
|
47f931c8f9
server : enable cache_prompt by default (#10501)
|
vor 1 Jahr |
Diego Devesa
|
10bce0450f
llama : accept a list of devices to use to offload a model (#10497)
|
vor 1 Jahr |
Georgi Gerganov
|
9ca2e67762
server : add speculative decoding support (#10455)
|
vor 1 Jahr |
Georgi Gerganov
|
d9d54e498d
speculative : refactor and add a simpler example (#10362)
|
vor 1 Jahr |
MaggotHATE
|
bcdb7a2386
server: (web UI) Add samplers sequence customization (#10255)
|
vor 1 Jahr |
Xuan Son Nguyen
|
9901068ac7
server : (web UI) add copy button for code block, fix api key (#10242)
|
vor 1 Jahr |
Jhen-Jie Hong
|
0e712a5acb
server : fix incorrect res in validate_model_chat_template (#10272)
|
vor 1 Jahr |
Xuan Son Nguyen
|
a71d81cf8c
server : revamp chat UI with vuejs and daisyui (#10175)
|
vor 1 Jahr |
Georgi Gerganov
|
b11f9ba9b8
server : remove hack for extra parallel slot (#10187)
|
vor 1 Jahr |
Xuan Son Nguyen
|
9e0ecfb697
server : clarify /slots endpoint, add is_processing (#10162)
|
vor 1 Jahr |
sasha0552
|
42cadc74bd
server : fix slot selection by lru (#10126)
|
vor 1 Jahr |
Georgi Gerganov
|
45950415ed
server : fix endpoint checks (#10135)
|
vor 1 Jahr |
sasha0552
|
d865d1478c
server : fix smart selection of available slot (#10120)
|
vor 1 Jahr |
Kevin Gibbons
|
0a683e8088
server : include scheme when printing URL (#10106)
|
vor 1 Jahr |
Georgi Gerganov
|
8d8ff71536
llama : remove Tail-Free sampling (#10071)
|
vor 1 Jahr |
Georgi Gerganov
|
8125e6cbfc
server : don't overfill the batch during infill (#10018)
|
vor 1 Jahr |
wwoodsTM
|
ff252ea48e
llama : add DRY sampler (#9702)
|
vor 1 Jahr |
Michael Podvitskiy
|
d80fb71f8b
llama: string_split fix (#10022)
|
vor 1 Jahr |
Georgi Gerganov
|
bc5ba007b2
server : check that the prompt fits in the slot's context (#10030)
|
vor 1 Jahr |
Xuan Son Nguyen
|
958367bf53
server : refactor slot input data, move tokenizer to HTTP thread (#10023)
|
vor 1 Jahr |
wwoodsTM
|
0a1c750c80
server : samplers accept the prompt correctly (#10019)
|
vor 1 Jahr |
Xuan Son Nguyen
|
cda0e4b648
llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745)
|
vor 1 Jahr |