cturan/llama.cpp

Автор	SHA1 Съобщение	Дата
Howard Su	b8c8dda75f Use unsigned for random seed (#2006)	преди 2 години
m3ndax	d3494bb86b llama : replacing auto &kv with const auto &kv (#2041)	преди 2 години
Howard Su	b922bc351b llama : remove shards weight file support (#2000)	преди 2 години
Johannes Gäßler	7f9753fa12 CUDA GPU acceleration for LoRAs + f16 models (#1970)	преди 2 години
ningshanwutuobang	cfa0750bc9 llama : support input embeddings directly (#1910)	преди 2 години
Georgi Gerganov	181e8d9755 llama : fix rope usage after ChatGLM change	преди 2 години
zrm	b853d45601 ggml : add NUMA support (#1556)	преди 2 години
Kawrakow	6769e944c7 k-quants : support for super-block size of 64 (#2001)	преди 2 години
Alex Renda	b061ba9e2a llama : fix top-p sampling to match the canonical definition (#1953)	преди 2 години
Didzis Gosko	527b6fba1d llama : make model stateless and context stateful (llama_state) (#1797)	преди 2 години
Ettore Di Giacinto	aacdbd4056 llama : fix params struct slignment (#1936)	преди 2 години
l3utterfly	ba4e85a833 llama : use aligned memory during ggml_init call from loading saved sessions (#1934)	преди 2 години
Kawrakow	cb40dfca69 llama : only use Q6_K for output weights if tensor size is multiple of 256 (#1932)	преди 2 години
Johannes Gäßler	16b9cd1939 Convert vector to f16 for dequantize mul mat vec (#1913)	преди 2 години
Johannes Gäßler	b24c3049d9 Added tokens per second to info prints (#1928)	преди 2 години
Johannes Gäßler	0ede372a51 Fixed incorrectly applying RMS norm twice (#1925)	преди 2 години
Kawrakow	8ab8ba62eb llama : prevent usage of k-quants when tensor size is not a multiple of 256 (#1921)	преди 2 години
Georgi Gerganov	ce2c7d72e2 metal : handle buffers larger than device's maxBufferLength (#1826)	преди 2 години
Georgi Gerganov	051e1b0e6a llama : fix kv_cache `n` init (close #1903)	преди 2 години
Howard Su	3d59ec5935 ggml : fix warnings under MSVC (#1908)	преди 2 години
Johannes Gäßler	ac3b886953 llama : fix embd when offloading non-repeating layers (#1891)	преди 2 години
Borislav Stanimirov	9cbf50c041 build : fix and ignore MSVC warnings (#1889)	преди 2 години
Johannes Gäßler	254a7a7a5f CUDA full GPU acceleration, KV cache in VRAM (#1827)	преди 2 години
xaedes	e32089b2c2 train : improved training-from-scratch example (#1652)	преди 2 години
Kerfuffle	74d4cfa343 Allow "quantizing" to f16 and f32 (#1787)	преди 2 години
Kawrakow	74a6d922f1 Metal implementation for all k_quants (#1807)	преди 2 години
Howard Su	58970a4c39 Leverage mmap for offloading tensors to GPU (#1597)	преди 2 години
Kerfuffle	4f0154b0ba llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691)	преди 2 години
Robert Sung-wook Shin	98ed165574 OpenCL: Add release memory (#1741)	преди 2 години
Georgi Gerganov	2d7bf110ed llama : fix vram_scratch var	преди 2 години

По-нови По-стари

Commit History Намери

Commit History