cturan/llama.cpp

Yazar	SHA1 Mesaj	Tarih
Kawrakow	6769e944c7 k-quants : support for super-block size of 64 (#2001)	2 yıl önce
Alex Renda	b061ba9e2a llama : fix top-p sampling to match the canonical definition (#1953)	2 yıl önce
Didzis Gosko	527b6fba1d llama : make model stateless and context stateful (llama_state) (#1797)	2 yıl önce
Ettore Di Giacinto	aacdbd4056 llama : fix params struct slignment (#1936)	2 yıl önce
l3utterfly	ba4e85a833 llama : use aligned memory during ggml_init call from loading saved sessions (#1934)	2 yıl önce
Kawrakow	cb40dfca69 llama : only use Q6_K for output weights if tensor size is multiple of 256 (#1932)	2 yıl önce
Johannes Gäßler	16b9cd1939 Convert vector to f16 for dequantize mul mat vec (#1913)	2 yıl önce
Johannes Gäßler	b24c3049d9 Added tokens per second to info prints (#1928)	2 yıl önce
Johannes Gäßler	0ede372a51 Fixed incorrectly applying RMS norm twice (#1925)	2 yıl önce
Kawrakow	8ab8ba62eb llama : prevent usage of k-quants when tensor size is not a multiple of 256 (#1921)	2 yıl önce
Georgi Gerganov	ce2c7d72e2 metal : handle buffers larger than device's maxBufferLength (#1826)	2 yıl önce
Georgi Gerganov	051e1b0e6a llama : fix kv_cache `n` init (close #1903)	2 yıl önce
Howard Su	3d59ec5935 ggml : fix warnings under MSVC (#1908)	2 yıl önce
Johannes Gäßler	ac3b886953 llama : fix embd when offloading non-repeating layers (#1891)	2 yıl önce
Borislav Stanimirov	9cbf50c041 build : fix and ignore MSVC warnings (#1889)	2 yıl önce
Johannes Gäßler	254a7a7a5f CUDA full GPU acceleration, KV cache in VRAM (#1827)	2 yıl önce
xaedes	e32089b2c2 train : improved training-from-scratch example (#1652)	2 yıl önce
Kerfuffle	74d4cfa343 Allow "quantizing" to f16 and f32 (#1787)	2 yıl önce
Kawrakow	74a6d922f1 Metal implementation for all k_quants (#1807)	2 yıl önce
Howard Su	58970a4c39 Leverage mmap for offloading tensors to GPU (#1597)	2 yıl önce
Kerfuffle	4f0154b0ba llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691)	2 yıl önce
Robert Sung-wook Shin	98ed165574 OpenCL: Add release memory (#1741)	2 yıl önce
Georgi Gerganov	2d7bf110ed llama : fix vram_scratch var	2 yıl önce
Georgi Gerganov	2a4e41a086 llama : fix compile warnings	2 yıl önce
Johannes Gäßler	17366df842 Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703)	2 yıl önce
Georgi Gerganov	44f906e853 metal : add f16 support	2 yıl önce
Georgi Gerganov	7a74dee6b4 llama : temporary disable Q6_K output quantization (#1711)	2 yıl önce
Spencer Sutton	590250f7a9 metal : add checks for buffer size (#1706)	2 yıl önce
mgroeber9110	c2df36d60d llama : consistently catch and throw only exceptions deriving from std::exception (#1599)	2 yıl önce
kiltyj	9d0693bce3 metal : use shared buffers between CPU and GPU (#1696)	2 yıl önce

Daha yeni Daha Eski

Geçmişin Kaydedilmesi Bul

Geçmişin Kaydedilmesi