Georgi Gerganov
|
c4f496648c
metal : fix kernel_norm (fixes Falcon on Metal) (#3057)
|
2 vuotta sitten |
Przemysław Pawełczyk
|
fec2fb19e4
ggml : posixify madvise and pagesize (#3037)
|
2 vuotta sitten |
Georgi Gerganov
|
178b1850eb
k-quants : fix zero-weight guard in Q6_K (ref #3040)
|
2 vuotta sitten |
Kerfuffle
|
ea2c85d5d2
convert-llama-ggml-to-gguf: Try to handle files older than GGJTv3 (#3023)
|
2 vuotta sitten |
Cebtenzzre
|
9912b9efc8
build : add LLAMA_METAL_NDEBUG flag (#3033)
|
2 vuotta sitten |
Cebtenzzre
|
9e2023156e
make : use new flag variables for recent changes (#3019)
|
2 vuotta sitten |
Cebtenzzre
|
de2fe892af
examples : replace fprintf to stdout with printf (#3017)
|
2 vuotta sitten |
Erik Scholz
|
c9c3220c48
convert: fix convert.py not working with int filename_stem (#3028)
|
2 vuotta sitten |
Kawrakow
|
d59bd97065
Guard against all weights in a super-block being zero (#3010)
|
2 vuotta sitten |
Georgi Gerganov
|
35938ee3b0
llama : update logic for number of threads when using BLAS
|
2 vuotta sitten |
Georgi Gerganov
|
921772104b
speculative : add grammar support (#2991)
|
2 vuotta sitten |
Georgi Gerganov
|
2ba85c8609
py : minor
|
2 vuotta sitten |
Georgi Gerganov
|
e36ecdccc8
build : on Mac OS enable Metal by default (#2901)
|
2 vuotta sitten |
slaren
|
bd33e5ab92
ggml-opencl : store GPU buffer in ggml_tensor::extra (#2994)
|
2 vuotta sitten |
Cebtenzzre
|
3103568144
llama-bench : make cpp file non-executable (#2999)
|
2 vuotta sitten |
Leng Yue
|
5b8530d88c
make : add speculative example (#3003)
|
2 vuotta sitten |
Aarni Koskela
|
e4386f417f
server : add a subtle loading animation to the edit box (#2466)
|
2 vuotta sitten |
Jiahao Li
|
35195689cd
2x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985)
|
2 vuotta sitten |
slaren
|
cf9b08485c
ggml-alloc : use virtual memory for measurement (#2973)
|
2 vuotta sitten |
Georgi Gerganov
|
47068e5170
speculative : PoC for speeding-up inference via speculative sampling (#2926)
|
2 vuotta sitten |
Georgi Gerganov
|
8f429fa511
perplexity : fix ETA by warming up the model with an empty run
|
2 vuotta sitten |
Kerfuffle
|
6519e9c99c
gguf(python): Fix special vocab handling when id < 0 (#2984)
|
2 vuotta sitten |
Georgi Gerganov
|
b7f2aa9e51
metal : restore 363f0bf and fix reduce in F16_F32 kernels (#2986)
|
2 vuotta sitten |
Alon
|
73a12a6344
cov : disable comment in PRs (#2989)
|
2 vuotta sitten |
opparco
|
3730134776
llama : fix bpe tokenize from byte (#2889)
|
2 vuotta sitten |
Georgi Gerganov
|
d9151e6f57
metal : revert 6af0bab until we fix it
|
2 vuotta sitten |
Alon
|
afc43d5f82
cov : add Code Coverage and codecov.io integration (#2928)
|
2 vuotta sitten |
Wentai Zhang
|
6460f758db
opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955)
|
2 vuotta sitten |
Kawrakow
|
ca82cf7bac
metal : more optimizations (#2959)
|
2 vuotta sitten |
kchro3
|
6a31a3bd98
swift : add support for k-quants (#2983)
|
2 vuotta sitten |