eric8607242
|
ee1b497c98
llama : support more diverse tokenizers? (#2420)
|
2 年之前 |
Rand Xie
|
65cdf34bdc
llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433)
|
2 年之前 |
Georgi Gerganov
|
1a941869cb
metal : disable graph concurrency optimization due to bug (#2413)
|
2 年之前 |
slaren
|
5488fb789e
ggml : allocate graphs in a context (#2392)
|
2 年之前 |
Kawrakow
|
eb542d3932
Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384)
|
2 年之前 |
slaren
|
da1889834a
ggml : improve graph build time via hash table lookup (#2329)
|
2 年之前 |
Shouzheng Liu
|
1aa18ef994
metal : concurrently dispatch commands (#2358)
|
2 年之前 |
slaren
|
41c674161f
make rms_norm_eps a parameter (#2374)
|
2 年之前 |
Evan Jones
|
84e09a7d8b
llama : add grammar-based sampling (#1773)
|
2 年之前 |
Georgi Gerganov
|
e76d630df1
llama : grouped-query attention + LLaMAv2 70B support (#2276)
|
2 年之前 |
Christian Demsar
|
a940458e48
llama : print max tensor size to stderr (#2336)
|
2 年之前 |
Georgi Gerganov
|
b47b8a9cfe
llama : optimize memory buffers (#2325)
|
2 年之前 |
Georgi Gerganov
|
513f861953
ggml : fix rope args order + assert (#2054)
|
2 年之前 |
Guillaume "Vermeille" Sanchez
|
ab0e26bdfb
llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280)
|
2 年之前 |
Georgi Gerganov
|
ae178ab46b
llama : make tensor_split ptr instead of array (#2272)
|
2 年之前 |
Georgi Gerganov
|
fff0e0eafe
llama : fix regression from #2000 - could not load no-mmap models
|
2 年之前 |
Rinne
|
294f424554
llama : extend API to get max devices at runtime (#2253)
|
2 年之前 |
Georgi Gerganov
|
d01bccde9f
ci : integrate with ggml-org/ci (#2250)
|
2 年之前 |
Alex Klinkhamer
|
b7647436cc
llama : fix t_start_sample_us initialization warning (#2238)
|
2 年之前 |
Xiao-Yong Jin
|
6e7cca4047
llama : add custom RoPE (#2054)
|
2 年之前 |
Bach Le
|
7513b7b0a1
llama : add functions that work directly on model (#2197)
|
2 年之前 |
Bach Le
|
c9c74b4e3f
llama : add classifier-free guidance (#2135)
|
2 年之前 |
LostRuins
|
bbef28218f
Possible solution to allow K-quants on models with n_vocab!=32000 (#2148)
|
2 年之前 |
Evan Miller
|
5656d10599
mpi : add support for distributed inference via MPI (#2099)
|
2 年之前 |
oobabooga
|
1d16309969
llama : remove "first token must be BOS" restriction (#2153)
|
2 年之前 |
Qingyou Meng
|
1d656d6360
ggml : change ggml_graph_compute() API to not require context (#1999)
|
2 年之前 |
Tobias Lütke
|
31cfbb1013
Expose generation timings from server & update completions.js (#2116)
|
2 年之前 |
Stephan Walter
|
1b107b8550
ggml : generalize `quantize_fns` for simpler FP16 handling (#1237)
|
2 年之前 |
Howard Su
|
051c70dcd5
llama: Don't double count the sampling time (#2107)
|
2 年之前 |
Johannes Gäßler
|
9e4475f5cf
Fixed OpenCL offloading prints (#2082)
|
2 年之前 |