cturan/llama.cpp

Autor	SHA1 Mensaxe	Data
Douglas Hanley	03bf161eb6 llama : support batched embeddings (#5466)	hai 1 ano
Johannes Gäßler	ad014bba97 make: add error message for bad CUDA version (#5444)	hai 1 ano
Georgi Gerganov	49cc1f7d67 bert : add tests + fix quantization (#5475)	hai 1 ano
Georgi Gerganov	99b8b43d7b tests : disable moe test (#5473)	hai 1 ano
Kawrakow	895407f31b ggml-quants : fix compiler warnings (shadow variable) (#5472)	hai 1 ano
Georgi Gerganov	099afc6274 llama : fix quantization when tensors are missing (#5423)	hai 1 ano
Georgi Gerganov	df334a1125 swift : package no longer use ggml dependency (#5465)	hai 1 ano
Lee	dbd8828eb0 py : fix persimmon `n_rot` conversion (#5460)	hai 1 ano
Abhilash Majumder	43fe07c1a4 ggml-sycl: Replace 3d ops with macro (#5458)	hai 1 ano
Daniel Bevenius	4a46d2b792 llava : remove prog parameter from ArgumentParser (#5457)	hai 1 ano
Georgi Gerganov	3b169441df sync : ggml (#5452)	hai 1 ano
Johannes Gäßler	3bdc4cd0f5 CUDA: mul_mat_vec_q tiling, refactor mul mat logic (#5434)	hai 1 ano
Douglas Hanley	2891c8aa9a Add support for BERT embedding models (#5423)	hai 1 ano
github-actions[bot]	97a336507e flake.lock: Update	hai 1 ano
Sergio López	c88c74f967 vulkan: only use M-sized matmul on Apple GPUs (#5412)	hai 1 ano
Alexey Parfenov	a803333a4e common : use enums for sampler types (#5418)	hai 1 ano
Alexey Parfenov	684780141a server : allow to specify tokens as strings in logit_bias (#5003)	hai 1 ano
Georgi Gerganov	85910c5b30 main : ctrl+C print timing in non-interactive mode (#3873)	hai 1 ano
Georgi Gerganov	139b62a839 common : fix compile warning	hai 1 ano
Georgi Gerganov	0f2411f154 ggml : fix compile warnings (unused vars) (#4966)	hai 1 ano
snadampal	a07d0fee1f ggml : add mmla kernels for quantized GEMM (#4966)	hai 1 ano
Johannes Gäßler	e4640d8fdf lookup: add print for drafting performance (#5450)	hai 1 ano
Xuan Son Nguyen	907e08c110 server : add llama2 chat template (#5425)	hai 1 ano
Ian Bull	f026f8120f metal : use autoreleasepool to avoid memory leaks (#5437)	hai 1 ano
Georgi Gerganov	cd9aea63b5 scripts : update sync scripts with new backends	hai 1 ano
Georgi Gerganov	43b65f5eb8 sync : ggml	hai 1 ano
Michael Podvitskiy	4633d93af0 ggml : add abort_callback for cpu backend (ggml/725)	hai 1 ano
Neuman Vong	4b7b38bef5 vulkan: Set limit for task concurrency (#5427)	hai 1 ano
Daniel Bevenius	e00d2a62dd llava : add requirements.txt and update README.md (#5428)	hai 1 ano
Riley Stewart	7c777fcd5d server : fix prompt caching for repeated prompts (#5420)	hai 1 ano

Posterior Anterior

Commit History Buscar

Commit History