cturan/llama.cpp

Author	SHA1 Message	Date
Kawrakow	bac66994cf Quantization imrovements for k_quants (#2707)	2 years ago
slaren	1123f7fbdf ggml-cuda : use graph allocator (#2684)	2 years ago
Georgi Gerganov	6381d4e110 gguf : new file format with flexible meta data (beta) (#2398)	2 years ago
slaren	097e121e2f llama : add benchmark example (#2626)	2 years ago
Evan Jones	604b8bdfa6 Fix unicode in grammars (fixes #2501) (#2553)	2 years ago
Georgi Gerganov	a73ccf1aa3 llama : replace (permute + reshape + view_1d) with (view_3d) (#2538)	2 years ago
Shouzheng Liu	fc8ef549e5 metal : enable ggml-alloc (#2627)	2 years ago
Shouzheng Liu	bf83bff674 metal : matrix-matrix multiplication kernel (#2615)	2 years ago
Jhen-Jie Hong	d783f7982e metal : return null instead of exit(1) (#2573)	2 years ago
grahameth	ea04a4ca19 add log_callback to llama_context_params for custom logging. (#2234)	2 years ago
Johannes Gäßler	acfc5478ff CUDA: tighter VRAM scratch size for 65b/70b (#2551)	2 years ago
Johannes Gäßler	3d9a551816 Fixed mmap prefetch for GPU offloading (#2529)	2 years ago
l3utterfly	415e99fec2 Stream save llama context data to file instead of allocating entire buffer upfront (#2488)	2 years ago
Johannes Gäßler	0728c5a8b9 CUDA: mmq CLI option, fixed mmq build issues (#2453)	2 years ago
slaren	9d2382b3e4 Fix Metal backend broken from the allocator changes (#2455)	2 years ago
slaren	a113689571 ggml : add graph tensor allocator (#2411)	2 years ago
eric8607242	ee1b497c98 llama : support more diverse tokenizers? (#2420)	2 years ago
Rand Xie	65cdf34bdc llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433)	2 years ago
Georgi Gerganov	1a941869cb metal : disable graph concurrency optimization due to bug (#2413)	2 years ago
slaren	5488fb789e ggml : allocate graphs in a context (#2392)	2 years ago
Kawrakow	eb542d3932 Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384)	2 years ago
slaren	da1889834a ggml : improve graph build time via hash table lookup (#2329)	2 years ago
Shouzheng Liu	1aa18ef994 metal : concurrently dispatch commands (#2358)	2 years ago
slaren	41c674161f make rms_norm_eps a parameter (#2374)	2 years ago
Evan Jones	84e09a7d8b llama : add grammar-based sampling (#1773)	2 years ago
Georgi Gerganov	e76d630df1 llama : grouped-query attention + LLaMAv2 70B support (#2276)	2 years ago
Christian Demsar	a940458e48 llama : print max tensor size to stderr (#2336)	2 years ago
Georgi Gerganov	b47b8a9cfe llama : optimize memory buffers (#2325)	2 years ago
Georgi Gerganov	513f861953 ggml : fix rope args order + assert (#2054)	2 years ago
Guillaume "Vermeille" Sanchez	ab0e26bdfb llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280)	2 years ago

Newer Older

Commit History Find

Commit History