cturan/llama.cpp

Author	SHA1 Message	Date
Sigbjørn Skjæret	6385b843a8 llama : add RobertaForSequenceClassification reranker support (#13875)	7 months ago
Piotr Jasiukajtis	4032ca4066 llama : add support for Qwen3 MoE tied word embeddings (#13768)	8 months ago
Georgi Gerganov	d13d0f6135 hparams : initialize arrays (#13728)	8 months ago
Xuan-Son Nguyen	8a2afb7520 llama : allow custom list of swa_layers (#13726)	8 months ago
Georgi Gerganov	8a1d206f1d tts : fix n_ubatch + make WavTokenizer cache-less (#13713)	8 months ago
Georgi Gerganov	797f2ac062 kv-cache : simplify the interface (#13660)	8 months ago
Georgi Gerganov	b44890df2e model : disable SWA for Phi models (#13676)	8 months ago
Georgi Gerganov	be0239693c model : fix llama4 graph (#13663)	8 months ago
Georgi Gerganov	e298d2fbd0 kv-cache : add SWA support (#13194)	8 months ago
Gabe Goodhart	5e7d95e22e fix: Move build_inp_pos to the top of the graph section for build_granite (#13538)	8 months ago
Gabe Goodhart	d590cd4c24 model : Granite MoE shared (#13269)	8 months ago
Johannes Gäßler	10d2af0eaa llama/ggml: add LLM training support (#10544)	8 months ago
Diego Devesa	27ebfcacba llama : do not crash if there is no CPU backend (#13395)	8 months ago
Xuan-Son Nguyen	3f96aeff39 llama : one-off chat template fix for Mistral-Small-2503 (#13398)	8 months ago
Georgi Gerganov	6562e5a4d6 context : allow cache-less context for embeddings (#13108)	8 months ago
Diego Devesa	f061021206 llama : print size and type of overridden tensors (#13364)	8 months ago
Sigbjørn Skjæret	bc4e1128f7 llama : deci : support ffn-free with attention (#13296)	8 months ago
piDack	6c7fd67b64 llama : support tie embedding for chatglm models (#13328)	8 months ago
ymcki	3bf785f3ef llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843)	8 months ago
Jared Van Bortel	2f567611c0 llama-model : support Qwen2 embedding models and pooling_mode_lasttoken (#13245)	8 months ago
Georgi Gerganov	c642bc014c kv-cache : separate recurrent vs non-recurrent impl (#12799)	8 months ago
Sigbjørn Skjæret	cb06a3c363 llama : orion rope type is neox (#13261)	8 months ago
Sigbjørn Skjæret	626083faf7 llama : plamo rope type is neox (#13260)	8 months ago
Jared Van Bortel	a70183eb00 llama-model : fix the reported size class for nomic-embed-text-v2-moe (#13223)	8 months ago
Johannes Gäßler	cdf76586b2 CUDA: fix non-cont. inputs for batched mat mul (#13155)	8 months ago
Sigbjørn Skjæret	7d3af70b08 llama : llm_type order by size (#13177)	8 months ago
Sigbjørn Skjæret	e98b3692be llama : set qwen3 model type sizes (#13175)	8 months ago
AT	5f5e39e1ba model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466)	8 months ago
Johannes Gäßler	69699be48a CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137)	8 months ago
Georgi Gerganov	2f74c354c0 graph : make FA compatible with MLA + add initial Metal kernels (#12953)	9 months ago

Newer Older

Commit History Find

Commit History