Jared Van Bortel
|
be55134a53
convert : refactor vocab selection logic (#6355)
|
пре 1 година |
compilade
|
557410b8f0
llama : greatly reduce output buffer memory usage (#6122)
|
пре 1 година |
Kawrakow
|
55c1b2a3bb
IQ1_M: 1.75 bpw quantization (#6302)
|
пре 1 година |
Kawrakow
|
d25b1c31b0
quantize : be able to override metadata by key (#6321)
|
пре 1 година |
Kawrakow
|
1d0331c12a
quantize: options for output and token embedding tensors qtype (#6239)
|
пре 1 година |
Pierrick Hymbert
|
dba1af6129
llama_model_loader: support multiple split/shard GGUFs (#6187)
|
пре 1 година |
Theia Vogel
|
877b4d0c62
llama : add support for control vectors (#5970)
|
пре 1 година |
Michael Podvitskiy
|
69ff61397d
llama : support models without vocabulary (#5798)
|
пре 1 година |
slaren
|
f30ea47a87
llama : add pipeline parallelism support (#6017)
|
пре 1 година |
Georgi Gerganov
|
05b06210c9
llama : more consistent names of count variables (#5994)
|
пре 1 година |
Georgi Gerganov
|
ee35600b90
llama : fix F16/F32 downcast + improve names (#5980)
|
пре 1 година |
DAN™
|
bcebd7dbf6
llama : add support for GritLM (#5959)
|
пре 1 година |
compilade
|
c2101a2e90
llama : support Mamba Selective State Space Models (#5328)
|
пре 1 година |
Georgi Gerganov
|
29ae62d2ae
llama : fix embeddings (#5796)
|
пре 1 година |
Douglas Hanley
|
475df1d6cf
llama : allow for user specified embedding pooling type (#5849)
|
пре 1 година |
Michael Podvitskiy
|
4a6e2d6142
llama : add abort_callback to interrupt computation (#5409)
|
пре 1 година |
Pierrick Hymbert
|
3ab8b3a92e
llama : cleanup unused mmq flags (#5772)
|
пре 1 година |
Marcus Dunn
|
d5ab29757e
llama : constified `llama_set_state_data`'s `src` (#5774)
|
пре 1 година |
Georgi Gerganov
|
08c5ee87e4
llama : remove deprecated API (#5770)
|
пре 1 година |
Kawrakow
|
0becb22ac0
IQ4_XS: a 4.25 bpw quantization (#5747)
|
пре 1 година |
Georgi Gerganov
|
9d533a77d0
llama : fix defrag bugs + add parameter (#5735)
|
пре 1 година |
Kawrakow
|
a33e6a0d2a
Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range (#5721)
|
пре 1 година |
Georgi Gerganov
|
bf08e00643
llama : refactor k-shift implementation + KV defragmentation (#5691)
|
пре 1 година |
Georgi Gerganov
|
ab336a9d5e
code : normalize enum names (#5697)
|
пре 1 година |
Kawrakow
|
4c4cb30736
IQ3_S: a much better alternative to Q3_K (#5676)
|
пре 1 година |
Xuan Son Nguyen
|
7c8bcc11dc
Add docs for llama_chat_apply_template (#5645)
|
пре 1 година |
Kawrakow
|
a14679cc30
IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590)
|
пре 1 година |
Xuan Son Nguyen
|
11b12de39b
llama : add llama_chat_apply_template() (#5538)
|
пре 1 година |
Kawrakow
|
bd2d4e393b
1.5 bit quantization (#5453)
|
пре 1 година |
bmwl
|
f486f6e1e5
ggml : add numa options (#5377)
|
пре 1 година |