Daniel Bevenius
|
433def286e
llama : rename ctx to user_data in progress_callback (#7045)
|
1 ano atrás |
Georgi Gerganov
|
9c67c2773d
ggml : add Flash Attention (#5021)
|
1 ano atrás |
Georgi Gerganov
|
f4ab2a4147
llama : fix BPE pre-tokenization (#6920)
|
1 ano atrás |
Pierrick Hymbert
|
0c4d489e29
quantize: add imatrix and dataset metadata in GGUF (#6658)
|
1 ano atrás |
slaren
|
017e6999b5
add basic tensor data validation function (#6884)
|
1 ano atrás |
jiez
|
1966eb2615
quantize : add '--keep-split' to quantize model into shards (#6688)
|
1 ano atrás |
Douglas Hanley
|
b4e4b8a935
llama : add llama_get_pooling_type function (#6862)
|
1 ano atrás |
Johannes Gäßler
|
28103f4832
Server: fix seed for multiple slots (#6835)
|
1 ano atrás |
Georgi Gerganov
|
40f74e4d73
llama : add option to render special/control tokens (#6807)
|
1 ano atrás |
Pedro Cuenca
|
b97bc3966e
llama : support Llama 3 HF conversion (#6745)
|
1 ano atrás |
Olivier Chafik
|
cbaadc9294
grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609)
|
1 ano atrás |
Jared Van Bortel
|
1b67731e18
BERT tokenizer fixes (#6498)
|
1 ano atrás |
Rick G
|
e3c337d87c
llama : support negative ith in llama_get_ API (#6519)
|
1 ano atrás |
Jan Boon
|
beea6e1b16
llama : save and restore kv cache for single seq id (#6341)
|
1 ano atrás |
Clint Herron
|
9b84ae1806
examples : add GBNF validator program (#5948)
|
1 ano atrás |
Jared Van Bortel
|
be55134a53
convert : refactor vocab selection logic (#6355)
|
1 ano atrás |
compilade
|
557410b8f0
llama : greatly reduce output buffer memory usage (#6122)
|
1 ano atrás |
Kawrakow
|
55c1b2a3bb
IQ1_M: 1.75 bpw quantization (#6302)
|
1 ano atrás |
Kawrakow
|
d25b1c31b0
quantize : be able to override metadata by key (#6321)
|
1 ano atrás |
Kawrakow
|
1d0331c12a
quantize: options for output and token embedding tensors qtype (#6239)
|
1 ano atrás |
Pierrick Hymbert
|
dba1af6129
llama_model_loader: support multiple split/shard GGUFs (#6187)
|
1 ano atrás |
Theia Vogel
|
877b4d0c62
llama : add support for control vectors (#5970)
|
1 ano atrás |
Michael Podvitskiy
|
69ff61397d
llama : support models without vocabulary (#5798)
|
1 ano atrás |
slaren
|
f30ea47a87
llama : add pipeline parallelism support (#6017)
|
1 ano atrás |
Georgi Gerganov
|
05b06210c9
llama : more consistent names of count variables (#5994)
|
1 ano atrás |
Georgi Gerganov
|
ee35600b90
llama : fix F16/F32 downcast + improve names (#5980)
|
1 ano atrás |
DAN™
|
bcebd7dbf6
llama : add support for GritLM (#5959)
|
1 ano atrás |
compilade
|
c2101a2e90
llama : support Mamba Selective State Space Models (#5328)
|
1 ano atrás |
Georgi Gerganov
|
29ae62d2ae
llama : fix embeddings (#5796)
|
1 ano atrás |
Douglas Hanley
|
475df1d6cf
llama : allow for user specified embedding pooling type (#5849)
|
1 ano atrás |