Kerfuffle
|
5d6f19f16b
Allow quantize to only copy tensors, some other improvements (#2931)
|
2 anos atrás |
Marcus Dunn
|
95b6e5212f
added `struct` to llama_dump_timing_info_yaml's `llama_context` (#2857)
|
2 anos atrás |
Johannes Gäßler
|
6b73ef1201
YAML result logging + preset script (#2657)
|
2 anos atrás |
igarnier
|
dd0dc366da
llama.h : add missing struct keyword for C compat in callback type (#2847)
|
2 anos atrás |
Georgi Gerganov
|
edd4c14817
llama : more tokenizer fixes (#2810)
|
2 anos atrás |
Marcus Dunn
|
232caf3c15
llama : fix struct decl (#2790)
|
2 anos atrás |
Matt Pulver
|
c82742ac9c
llama : add llama_beam_search() (#2267)
|
2 anos atrás |
slaren
|
154725c543
llama-bench : add model sizes (#2771)
|
2 anos atrás |
Marcus Dunn
|
2e5f70a25f
Added `enum` to `llama_token_get_type` return type (#2774)
|
2 anos atrás |
Georgi Gerganov
|
cf658adc83
llm : add Falcon support (#2717)
|
2 anos atrás |
Georgi Gerganov
|
deb7dfca4b
gguf : add ftype meta info to the model (#2710)
|
2 anos atrás |
Georgi Gerganov
|
6381d4e110
gguf : new file format with flexible meta data (beta) (#2398)
|
2 anos atrás |
slaren
|
097e121e2f
llama : add benchmark example (#2626)
|
2 anos atrás |
Kamil Tomšík
|
348acf188c
llama : add missing enum keyword in function signatures (#2610)
|
2 anos atrás |
grahameth
|
ea04a4ca19
add log_callback to llama_context_params for custom logging. (#2234)
|
2 anos atrás |
Johannes Gäßler
|
0728c5a8b9
CUDA: mmq CLI option, fixed mmq build issues (#2453)
|
2 anos atrás |
Kawrakow
|
eb542d3932
Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384)
|
2 anos atrás |
slaren
|
41c674161f
make rms_norm_eps a parameter (#2374)
|
2 anos atrás |
Evan Jones
|
84e09a7d8b
llama : add grammar-based sampling (#1773)
|
2 anos atrás |
Georgi Gerganov
|
e76d630df1
llama : grouped-query attention + LLaMAv2 70B support (#2276)
|
2 anos atrás |
Guillaume "Vermeille" Sanchez
|
ab0e26bdfb
llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280)
|
2 anos atrás |
Georgi Gerganov
|
ae178ab46b
llama : make tensor_split ptr instead of array (#2272)
|
2 anos atrás |
Rinne
|
294f424554
llama : extend API to get max devices at runtime (#2253)
|
2 anos atrás |
Xiao-Yong Jin
|
6e7cca4047
llama : add custom RoPE (#2054)
|
2 anos atrás |
Bach Le
|
7513b7b0a1
llama : add functions that work directly on model (#2197)
|
2 anos atrás |
Bach Le
|
c9c74b4e3f
llama : add classifier-free guidance (#2135)
|
2 anos atrás |
Evan Miller
|
5656d10599
mpi : add support for distributed inference via MPI (#2099)
|
2 anos atrás |
Tobias Lütke
|
31cfbb1013
Expose generation timings from server & update completions.js (#2116)
|
2 anos atrás |
Howard Su
|
b8c8dda75f
Use unsigned for random seed (#2006)
|
2 anos atrás |
ningshanwutuobang
|
cfa0750bc9
llama : support input embeddings directly (#1910)
|
2 anos atrás |