cturan/llama.cpp

Autor	SHA1 Mensagem	Data
Georgi Gerganov	d67777c202 metal : add Q8_0 support (#2763)	há 2 anos atrás
Georgi Gerganov	cf658adc83 llm : add Falcon support (#2717)	há 2 anos atrás
Georgi Gerganov	6381d4e110 gguf : new file format with flexible meta data (beta) (#2398)	há 2 anos atrás
Jhen-Jie Hong	ed53db86c3 metal : print error of load pipeline state (#2564)	há 2 anos atrás
Shouzheng Liu	fc8ef549e5 metal : enable ggml-alloc (#2627)	há 2 anos atrás
Shouzheng Liu	bf83bff674 metal : matrix-matrix multiplication kernel (#2615)	há 2 anos atrás
Jhen-Jie Hong	d783f7982e metal : return null instead of exit(1) (#2573)	há 2 anos atrás
Georgi Gerganov	f6f9896ac3 metal : fix out-of-bounds access + inc concurrency nodes (#2416)	há 2 anos atrás
Matteo Boschini	1873ff586b metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)	há 2 anos atrás
Shouzheng Liu	1aa18ef994 metal : concurrently dispatch commands (#2358)	há 2 anos atrás
slaren	41c674161f make rms_norm_eps a parameter (#2374)	há 2 anos atrás
Georgi Gerganov	5b2b2dc6ae ggml : sync (unary ops refactor, static-correctness) (#2370)	há 2 anos atrás
slaren	95a6c595e7 ggml: move op parameters from tensors to ggml_tensor::op_params (#2333)	há 2 anos atrás
Jiahao Li	83a00ce69b metal : support bcast add & dup & cont op (#2323)	há 2 anos atrás
Kawrakow	4d76a5f49b Faster Q3_K implementation on Metal (#2307)	há 2 anos atrás
Kawrakow	e68c96f7fe Faster Q2_K on Metal (#2297)	há 2 anos atrás
Kawrakow	e782c9e735 Faster Q5_K and Q6_K on Metal (#2294)	há 2 anos atrás
Kawrakow	785829dfe8 Faster Q4_K on Metal (#2290)	há 2 anos atrás
Shouzheng Liu	417a85a001 metal: minor q4 optimization and reduce code size (#2248)	há 2 anos atrás
Xiao-Yong Jin	6e7cca4047 llama : add custom RoPE (#2054)	há 2 anos atrás
Kawrakow	27ad57a69b Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212)	há 2 anos atrás
Shouzheng Liu	1cbf561466 metal : new q4_0 matrix-vector kernel (#2188)	há 2 anos atrás
Spencer Sutton	5bf2a27718 ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)	há 2 anos atrás
Evan Miller	5656d10599 mpi : add support for distributed inference via MPI (#2099)	há 2 anos atrás
Qingyou Meng	1d656d6360 ggml : change ggml_graph_compute() API to not require context (#1999)	há 2 anos atrás
Aaron Miller	2f8cd979ec metal : release buffers when freeing metal context (#2062)	há 2 anos atrás
Kawrakow	6769e944c7 k-quants : support for super-block size of 64 (#2001)	há 2 anos atrás
Georgi Gerganov	ce2c7d72e2 metal : handle buffers larger than device's maxBufferLength (#1826)	há 2 anos atrás
Georgi Gerganov	4f9c43e3bd minor : warning fixes	há 2 anos atrás
Aaron Miller	0711a5f6dc metal : add norm, cpy f16->f16, alibi kernels (#1823)	há 2 anos atrás

Recente Antigo

Histórico de Commits Pesquisar

Histórico de Commits