Georgi Gerganov
|
d67777c202
metal : add Q8_0 support (#2763)
|
2 vuotta sitten |
Georgi Gerganov
|
cf658adc83
llm : add Falcon support (#2717)
|
2 vuotta sitten |
Georgi Gerganov
|
6381d4e110
gguf : new file format with flexible meta data (beta) (#2398)
|
2 vuotta sitten |
Jhen-Jie Hong
|
ed53db86c3
metal : print error of load pipeline state (#2564)
|
2 vuotta sitten |
Shouzheng Liu
|
fc8ef549e5
metal : enable ggml-alloc (#2627)
|
2 vuotta sitten |
Shouzheng Liu
|
bf83bff674
metal : matrix-matrix multiplication kernel (#2615)
|
2 vuotta sitten |
Jhen-Jie Hong
|
d783f7982e
metal : return null instead of exit(1) (#2573)
|
2 vuotta sitten |
Georgi Gerganov
|
f6f9896ac3
metal : fix out-of-bounds access + inc concurrency nodes (#2416)
|
2 vuotta sitten |
Matteo Boschini
|
1873ff586b
metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)
|
2 vuotta sitten |
Shouzheng Liu
|
1aa18ef994
metal : concurrently dispatch commands (#2358)
|
2 vuotta sitten |
slaren
|
41c674161f
make rms_norm_eps a parameter (#2374)
|
2 vuotta sitten |
Georgi Gerganov
|
5b2b2dc6ae
ggml : sync (unary ops refactor, static-correctness) (#2370)
|
2 vuotta sitten |
slaren
|
95a6c595e7
ggml: move op parameters from tensors to ggml_tensor::op_params (#2333)
|
2 vuotta sitten |
Jiahao Li
|
83a00ce69b
metal : support bcast add & dup & cont op (#2323)
|
2 vuotta sitten |
Kawrakow
|
4d76a5f49b
Faster Q3_K implementation on Metal (#2307)
|
2 vuotta sitten |
Kawrakow
|
e68c96f7fe
Faster Q2_K on Metal (#2297)
|
2 vuotta sitten |
Kawrakow
|
e782c9e735
Faster Q5_K and Q6_K on Metal (#2294)
|
2 vuotta sitten |
Kawrakow
|
785829dfe8
Faster Q4_K on Metal (#2290)
|
2 vuotta sitten |
Shouzheng Liu
|
417a85a001
metal: minor q4 optimization and reduce code size (#2248)
|
2 vuotta sitten |
Xiao-Yong Jin
|
6e7cca4047
llama : add custom RoPE (#2054)
|
2 vuotta sitten |
Kawrakow
|
27ad57a69b
Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212)
|
2 vuotta sitten |
Shouzheng Liu
|
1cbf561466
metal : new q4_0 matrix-vector kernel (#2188)
|
2 vuotta sitten |
Spencer Sutton
|
5bf2a27718
ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)
|
2 vuotta sitten |
Evan Miller
|
5656d10599
mpi : add support for distributed inference via MPI (#2099)
|
2 vuotta sitten |
Qingyou Meng
|
1d656d6360
ggml : change ggml_graph_compute() API to not require context (#1999)
|
2 vuotta sitten |
Aaron Miller
|
2f8cd979ec
metal : release buffers when freeing metal context (#2062)
|
2 vuotta sitten |
Kawrakow
|
6769e944c7
k-quants : support for super-block size of 64 (#2001)
|
2 vuotta sitten |
Georgi Gerganov
|
ce2c7d72e2
metal : handle buffers larger than device's maxBufferLength (#1826)
|
2 vuotta sitten |
Georgi Gerganov
|
4f9c43e3bd
minor : warning fixes
|
2 vuotta sitten |
Aaron Miller
|
0711a5f6dc
metal : add norm, cpy f16->f16, alibi kernels (#1823)
|
2 vuotta sitten |