cturan/llama.cpp

Author	SHA1 Message	Date
Pierrick Hymbert	4bd0f93e4a model: support arch `DbrxForCausalLM` (#6515)	1 year ago
jiez	91c736015b llama : add gguf_remove_key + remove split meta during quantize (#6591)	1 year ago
MasterYi1024	dee7f8d692 Correct free memory and total memory. (#6630)	1 year ago
Clint Herron	04a5ac211e Optimization: eliminate addition of redundant stacks when advancing grammar. (#6616)	1 year ago
Olivier Chafik	cbaadc9294 grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609)	1 year ago
Pierrick Hymbert	b804b1ef77 eval-callback: Example how to use eval callback for debugging (#6576)	1 year ago
slaren	4f407a0a35 llama : add model types for mixtral (#6589)	1 year ago
Jared Van Bortel	1b67731e18 BERT tokenizer fixes (#6498)	1 year ago
Carolinabanana	5dc9dd7152 llama : add Command R Plus support (#6491)	1 year ago
Georgi Gerganov	cc4a95426d llama : fix attention layer count sanity check (#6550)	1 year ago
Georgi Gerganov	b73e564b16 quantize : fix precedence of cli args (#6541)	1 year ago
Rick G	e3c337d87c llama : support negative ith in llama_get_ API (#6519)	1 year ago
Jan Boon	beea6e1b16 llama : save and restore kv cache for single seq id (#6341)	1 year ago
Brian	a8bd14d557 gguf.py : add licence and version to gguf writer (#6504)	1 year ago
Clint Herron	9b84ae1806 examples : add GBNF validator program (#5948)	1 year ago
bryanSwk	bb43cf7e9d llama : add SEA-LION support (#6448)	1 year ago
kaizau	1ff4d9f3d6 Add OpenChat, Alpaca, Vicuna chat templates (#6397)	1 year ago
slaren	08a0c02060 ggml : mul_mat_id use the same tensor for all the experts (#6387)	1 year ago
0cc4m	ba0c7c70ab Vulkan k-quant mmq and ggml-backend offload functionality (#6155)	1 year ago
hxer7963	069574775c [Model] Add support for xverse (#6301)	1 year ago
Daniel Bevenius	057400a3fd llama : remove redundant reshape in build_kv_store (#6369)	1 year ago
compilade	0308f5e3d7 llama : fix command-r inference when omitting outputs (#6367)	1 year ago
Jared Van Bortel	32c8486e1f wpm : portable unicode tolower (#6305)	1 year ago
compilade	557410b8f0 llama : greatly reduce output buffer memory usage (#6122)	1 year ago
Kawrakow	55c1b2a3bb IQ1_M: 1.75 bpw quantization (#6302)	1 year ago
Kawrakow	d25b1c31b0 quantize : be able to override metadata by key (#6321)	1 year ago
slaren	280345968d cuda : rename build flag to LLAMA_CUDA (#6299)	1 year ago
Meng, Hengyu	ddf6568510 [SYCL] offload op (#6217)	1 year ago
Jared Van Bortel	94d1b3b411 use _wfopen instead of fopen on Windows (#6248)	1 year ago
Pierrick Hymbert	f482bb2e49 common: llama_load_model_from_url split support (#6192)	1 year ago

Newer Older

Commit History Find

Commit History