cturan/llama.cpp

作者	SHA1 備註	提交日期
eric8607242	ee1b497c98 llama : support more diverse tokenizers? (#2420)	2 年之前
Rand Xie	65cdf34bdc llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433)	2 年之前
Georgi Gerganov	1a941869cb metal : disable graph concurrency optimization due to bug (#2413)	2 年之前
slaren	5488fb789e ggml : allocate graphs in a context (#2392)	2 年之前
Kawrakow	eb542d3932 Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384)	2 年之前
slaren	da1889834a ggml : improve graph build time via hash table lookup (#2329)	2 年之前
Shouzheng Liu	1aa18ef994 metal : concurrently dispatch commands (#2358)	2 年之前
slaren	41c674161f make rms_norm_eps a parameter (#2374)	2 年之前
Evan Jones	84e09a7d8b llama : add grammar-based sampling (#1773)	2 年之前
Georgi Gerganov	e76d630df1 llama : grouped-query attention + LLaMAv2 70B support (#2276)	2 年之前
Christian Demsar	a940458e48 llama : print max tensor size to stderr (#2336)	2 年之前
Georgi Gerganov	b47b8a9cfe llama : optimize memory buffers (#2325)	2 年之前
Georgi Gerganov	513f861953 ggml : fix rope args order + assert (#2054)	2 年之前
Guillaume "Vermeille" Sanchez	ab0e26bdfb llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280)	2 年之前
Georgi Gerganov	ae178ab46b llama : make tensor_split ptr instead of array (#2272)	2 年之前
Georgi Gerganov	fff0e0eafe llama : fix regression from #2000 - could not load no-mmap models	2 年之前
Rinne	294f424554 llama : extend API to get max devices at runtime (#2253)	2 年之前
Georgi Gerganov	d01bccde9f ci : integrate with ggml-org/ci (#2250)	2 年之前
Alex Klinkhamer	b7647436cc llama : fix t_start_sample_us initialization warning (#2238)	2 年之前
Xiao-Yong Jin	6e7cca4047 llama : add custom RoPE (#2054)	2 年之前
Bach Le	7513b7b0a1 llama : add functions that work directly on model (#2197)	2 年之前
Bach Le	c9c74b4e3f llama : add classifier-free guidance (#2135)	2 年之前
LostRuins	bbef28218f Possible solution to allow K-quants on models with n_vocab!=32000 (#2148)	2 年之前
Evan Miller	5656d10599 mpi : add support for distributed inference via MPI (#2099)	2 年之前
oobabooga	1d16309969 llama : remove "first token must be BOS" restriction (#2153)	2 年之前
Qingyou Meng	1d656d6360 ggml : change ggml_graph_compute() API to not require context (#1999)	2 年之前
Tobias Lütke	31cfbb1013 Expose generation timings from server & update completions.js (#2116)	2 年之前
Stephan Walter	1b107b8550 ggml : generalize `quantize_fns` for simpler FP16 handling (#1237)	2 年之前
Howard Su	051c70dcd5 llama: Don't double count the sampling time (#2107)	2 年之前
Johannes Gäßler	9e4475f5cf Fixed OpenCL offloading prints (#2082)	2 年之前

更新的提交更舊的提交

Commit History 查找

Commit History