Jared Van Bortel
|
32c8486e1f
wpm : portable unicode tolower (#6305)
|
1 year ago |
compilade
|
557410b8f0
llama : greatly reduce output buffer memory usage (#6122)
|
1 year ago |
Kawrakow
|
55c1b2a3bb
IQ1_M: 1.75 bpw quantization (#6302)
|
1 year ago |
Kawrakow
|
d25b1c31b0
quantize : be able to override metadata by key (#6321)
|
1 year ago |
slaren
|
280345968d
cuda : rename build flag to LLAMA_CUDA (#6299)
|
1 year ago |
Meng, Hengyu
|
ddf6568510
[SYCL] offload op (#6217)
|
1 year ago |
Jared Van Bortel
|
94d1b3b411
use _wfopen instead of fopen on Windows (#6248)
|
1 year ago |
Pierrick Hymbert
|
f482bb2e49
common: llama_load_model_from_url split support (#6192)
|
1 year ago |
Julius Arkenberg
|
476b0251b2
llama : add grok-1 support (#6204)
|
1 year ago |
Kawrakow
|
1d0331c12a
quantize: options for output and token embedding tensors qtype (#6239)
|
1 year ago |
Pierrick Hymbert
|
dba1af6129
llama_model_loader: support multiple split/shard GGUFs (#6187)
|
1 year ago |
Nexesenex
|
e80f06d2a1
llama : correction of the attn.v.weight quantization for IQ3_XS (#6209)
|
1 year ago |
Georgi Gerganov
|
95d576b48e
metal : pad n_ctx by 32 (#6177)
|
1 year ago |
Jared Van Bortel
|
d199ca79f2
mpt : implement backwards compatiblity with duped output tensor (#6139)
|
1 year ago |
slaren
|
2bf8d0f7c4
backend : offload large batches to GPU (#6083)
|
1 year ago |
slaren
|
d84c48505f
llama : fix Baichuan2 13B (#6092)
|
1 year ago |
Theia Vogel
|
877b4d0c62
llama : add support for control vectors (#5970)
|
1 year ago |
Andrew Canis
|
12247f4c69
llama : add Command-R support (#6033)
|
1 year ago |
Neo Zhang Jianyu
|
46acb36767
fix set main gpu error (#6073)
|
1 year ago |
Xuan Son Nguyen
|
aab606a11f
llama : add Orion chat template (#6066)
|
1 year ago |
Georgi Gerganov
|
4755afd1cb
llama : fix integer overflow during quantization (#6063)
|
1 year ago |
Michael Podvitskiy
|
69ff61397d
llama : support models without vocabulary (#5798)
|
1 year ago |
Georgi Gerganov
|
a44bc969e4
llama : fix typo
|
1 year ago |
Michael Podvitskiy
|
2c4fb69246
llama : optimize defrag moves + fix fragmentation calculation (#6037)
|
1 year ago |
slaren
|
f30ea47a87
llama : add pipeline parallelism support (#6017)
|
1 year ago |
gliptic
|
5cdb371731
grammar : fix unnecessarily retained pointer to rules (#6003)
|
1 year ago |
Georgi Gerganov
|
05b06210c9
llama : more consistent names of count variables (#5994)
|
1 year ago |
Georgi Gerganov
|
83796e62bc
llama : refactor unicode stuff (#5992)
|
1 year ago |
Michael Podvitskiy
|
3202361c5b
ggml, ci : Windows ARM runner and build fixes (#5979)
|
1 year ago |
Georgi Gerganov
|
ee35600b90
llama : fix F16/F32 downcast + improve names (#5980)
|
1 year ago |