slaren
|
a249843d89
common : restore --n-gpu-layers (#9371)
|
1 ano atrás |
Xuan Son Nguyen
|
00b02bb249
imatrix : fix arg parser for imatrix (#9366)
|
1 ano atrás |
Georgi Gerganov
|
faf69d4237
llama : sanitize invalid tokens (#9357)
|
1 ano atrás |
Xuan Son Nguyen
|
1b9ae5189c
common : refactor arg parser (#9308)
|
1 ano atrás |
Georgi Gerganov
|
df270ef745
llama : refactor sampling v2 (#9294)
|
1 ano atrás |
Aarni Koskela
|
815b1fb20a
batched-bench : add `--output-format jsonl` option (#9293)
|
1 ano atrás |
Radoslav Gerganov
|
82e3b03c11
rpc : make RPC servers come first in the device list (#9296)
|
1 ano atrás |
Faisal Zaghloul
|
42c76d1358
Threadpool: take 2 (#8672)
|
1 ano atrás |
Xuan Son Nguyen
|
a77feb5d71
server : add some missing env variables (#9116)
|
1 ano atrás |
Justine Tunney
|
436787f170
llama : fix time complexity of string replacement (#9163)
|
1 ano atrás |
Herman Semenov
|
93bc3839f9
common: fixed not working find argument --n-gpu-layers-draft (#9175)
|
1 ano atrás |
Xuan Son Nguyen
|
fc54ef0d1c
server : support reading arguments from environment variables (#9105)
|
1 ano atrás |
Liu Jia
|
fb487bb567
common : add support for cpu_get_num_physical_cores() on Windows (#8771)
|
1 ano atrás |
Zhenwei Jin
|
4af8420afb
common : remove duplicate function llama_should_add_bos_token (#8778)
|
1 ano atrás |
fairydreaming
|
7c3f55c100
Add support for encoder-only T5 models (#8900)
|
1 ano atrás |
Georgi Gerganov
|
45a55b91aa
llama : better replace_all (cont) (#8926)
|
1 ano atrás |
Xuan Son Nguyen
|
1e6f6554aa
server : add lora hotswap endpoint (WIP) (#8857)
|
1 ano atrás |
Liu Jia
|
0a4ce78681
common : Changed tuple to struct (TODO fix) (#8823)
|
1 ano atrás |
Igor Okulist
|
afbbcf3c04
server : update llama-server embedding flag documentation (#8779)
|
1 ano atrás |
Daniel Bevenius
|
9d03d085dd
common : add --no-warmup option for main/llama-cli (#8712)
|
1 ano atrás |
Xuan Son Nguyen
|
96952e7181
llama : fix `llama_chat_format_single` for mistral (#8657)
|
1 ano atrás |
Xuan Son Nguyen
|
de280085e7
examples : Fix `llama-export-lora` example (#8607)
|
1 ano atrás |
Xuan Son Nguyen
|
97bdd26eee
Refactor lora adapter support (#8332)
|
1 ano atrás |
Georgi Gerganov
|
9104bc20ed
common : add --no-cont-batching arg (#6358)
|
1 ano atrás |
Borislav Stanimirov
|
7a80710d93
msvc : silence codecvt c++17 deprecation warnings (#8395)
|
1 ano atrás |
Derrick T. Woolworth
|
86e7299ef5
added support for Authorization Bearer tokens when downloading model (#8307)
|
1 ano atrás |
jaime-m-p
|
213701b51a
Detokenizer fixes (#8039)
|
1 ano atrás |
Douglas Hanley
|
d12f781074
llama : streamline embeddings from "non-embedding" models (#8087)
|
1 ano atrás |
Xuan Son Nguyen
|
a38b884c6c
cli: add EOT when user hit Ctrl+C (#8296)
|
1 ano atrás |
fairydreaming
|
807b0c49ff
Inference support for T5 and FLAN-T5 model families (#5763)
|
1 ano atrás |