Georgi Gerganov
|
b7f2aa9e51
metal : restore 363f0bf and fix reduce in F16_F32 kernels (#2986)
|
2 rokov pred |
Georgi Gerganov
|
d9151e6f57
metal : revert 6af0bab until we fix it
|
2 rokov pred |
Kawrakow
|
ca82cf7bac
metal : more optimizations (#2959)
|
2 rokov pred |
Georgi Gerganov
|
13268c5331
metal : slight speed-up for add and mul kernels (#2917)
|
2 rokov pred |
Kawrakow
|
e8d9158925
metal: somewhat faster f16 x f32 matrix multiply kernel (#2951)
|
2 rokov pred |
Georgi Gerganov
|
d67777c202
metal : add Q8_0 support (#2763)
|
2 rokov pred |
Georgi Gerganov
|
cf658adc83
llm : add Falcon support (#2717)
|
2 rokov pred |
Shouzheng Liu
|
14b1d7e6f7
metal : add missing barriers for mul-mat (#2699)
|
2 rokov pred |
Shouzheng Liu
|
dadbed99e6
metal : fix synchronization in new matrix multiplication kernel (#2686)
|
2 rokov pred |
Shouzheng Liu
|
bf83bff674
metal : matrix-matrix multiplication kernel (#2615)
|
2 rokov pred |
Matteo Boschini
|
1873ff586b
metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)
|
2 rokov pred |
Kawrakow
|
9a08eaf3c4
Another speed gain for Q4_0 and Q4_1 on Metal (#2375)
|
2 rokov pred |
Jiahao Li
|
83a00ce69b
metal : support bcast add & dup & cont op (#2323)
|
2 rokov pred |
Kawrakow
|
4d76a5f49b
Faster Q3_K implementation on Metal (#2307)
|
2 rokov pred |
Kawrakow
|
e68c96f7fe
Faster Q2_K on Metal (#2297)
|
2 rokov pred |
Kawrakow
|
e782c9e735
Faster Q5_K and Q6_K on Metal (#2294)
|
2 rokov pred |
Kawrakow
|
785829dfe8
Faster Q4_K on Metal (#2290)
|
2 rokov pred |
Shouzheng Liu
|
417a85a001
metal: minor q4 optimization and reduce code size (#2248)
|
2 rokov pred |
Xiao-Yong Jin
|
6e7cca4047
llama : add custom RoPE (#2054)
|
2 rokov pred |
Kawrakow
|
27ad57a69b
Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212)
|
2 rokov pred |
Shouzheng Liu
|
1cbf561466
metal : new q4_0 matrix-vector kernel (#2188)
|
2 rokov pred |
Kawrakow
|
6769e944c7
k-quants : support for super-block size of 64 (#2001)
|
2 rokov pred |
Aaron Miller
|
0711a5f6dc
metal : add norm, cpy f16->f16, alibi kernels (#1823)
|
2 rokov pred |
Kawrakow
|
74a6d922f1
Metal implementation for all k_quants (#1807)
|
2 rokov pred |
Kawrakow
|
e9b66ee982
metal : add Q4_1 implementation (#1785)
|
2 rokov pred |
Georgi Gerganov
|
b33dee282f
metal : fix build "tanhf" -> "tanh"
|
2 rokov pred |
AT
|
92f44ff7f7
metal : add GELU implementation (#1770)
|
2 rokov pred |
Kawrakow
|
245fc3c37d
metal : faster q4_0 (#1775)
|
2 rokov pred |
Kawrakow
|
72ff5282bf
metal : add Q2_K implementation (#1762)
|
2 rokov pred |
Kawrakow
|
0f291e1f65
metal : Q6_K implementation (#1752)
|
2 rokov pred |