cturan/llama.cpp

Autor	SHA1 Mensaje	Fecha
Rahul Vivek Nair	fb98254f99 Fix typo in README.md (#1961)	hace 2 años
Georgi Gerganov	049aa16b8c readme : add link to p1	hace 2 años
Xiake Sun	2322ec223a Fix typo (#1949)	hace 2 años
Ettore Di Giacinto	aacdbd4056 llama : fix params struct slignment (#1936)	hace 2 años
Henri Vasserman	20568fe60f [Fix] Reenable server embedding endpoint (#1937)	hace 2 años
Georgi Gerganov	18b35625c3 ggml : fix bug in LBFGS optimizer (found by ggml tests)	hace 2 años
l3utterfly	ba4e85a833 llama : use aligned memory during ggml_init call from loading saved sessions (#1934)	hace 2 años
Georgi Gerganov	23fc5c219a cmake : fix trailing whitespaces	hace 2 años
Kawrakow	cb40dfca69 llama : only use Q6_K for output weights if tensor size is multiple of 256 (#1932)	hace 2 años
Kawrakow	ca7c3f4da5 cuda : faster k-quants on older GPUs (#1930)	hace 2 años
Georgi Gerganov	b97ca431db ggml : sync latest ggml repo (#1924)	hace 2 años
Howard Su	1e3abfcef0 cmake : fix build shared ggml when CUDA is enabled (#1929)	hace 2 años
Johannes Gäßler	16b9cd1939 Convert vector to f16 for dequantize mul mat vec (#1913)	hace 2 años
Johannes Gäßler	b24c3049d9 Added tokens per second to info prints (#1928)	hace 2 años
Johannes Gäßler	0ede372a51 Fixed incorrectly applying RMS norm twice (#1925)	hace 2 años
l3utterfly	8596af4277 ggml : fix bug in ggml_compute_forward_add_q_f32 (#1918)	hace 2 años
Mike	e1886cf4fe readme : update Android build instructions (#1922)	hace 2 años
Kawrakow	8ab8ba62eb llama : prevent usage of k-quants when tensor size is not a multiple of 256 (#1921)	hace 2 años
Kawrakow	90cc59d6ab examples : fix examples/metal (#1920)	hace 2 años
Georgi Gerganov	ce2c7d72e2 metal : handle buffers larger than device's maxBufferLength (#1826)	hace 2 años
Howard Su	57cd69460f cmake : add CUDA_ARCHITECTURES to new target ggml_static (#1917)	hace 2 años
Georgi Gerganov	b2416493ab make : do not print help for simple example	hace 2 años
Georgi Gerganov	4f9c43e3bd minor : warning fixes	hace 2 años
Johannes Gäßler	2c9380dd2f Only one CUDA stream per device for async compute (#1898)	hace 2 años
Georgi Gerganov	051e1b0e6a llama : fix kv_cache `n` init (close #1903)	hace 2 años
DaniAndTheWeb	86c7571864 make : update for latest Arch (#1701)	hace 2 años
Howard Su	3d59ec5935 ggml : fix warnings under MSVC (#1908)	hace 2 años
Aaron Miller	0711a5f6dc metal : add norm, cpy f16->f16, alibi kernels (#1823)	hace 2 años
Faez Shakil	fc45a81bc6 exposed modules so that they can be invoked by nix run github:ggerganov/llama.cpp#server etc (#1863)	hace 2 años
Randall Fitzgerald	794db3e7b9 Server Example Refactor and Improvements (#1570)	hace 2 años

Posterior Anterior

Historial de Commits Buscar

Historial de Commits