goerch
|
71ca2fad7d
whisper : tokenizer fix + re-enable tokenizer test for LLaMa (#3096)
|
2 years ago |
Tristan Ross
|
1b6c650d16
cmake : add a compiler flag check for FP16 format (#3086)
|
2 years ago |
Johannes Gäßler
|
0a5eebb45d
CUDA: mul_mat_q RDNA2 tunings (#2910)
|
2 years ago |
FK
|
84e723653c
speculative: add --n-gpu-layers-draft option (#3063)
|
2 years ago |
Eric Sommerlade
|
b52b29ab9d
arm64 support for windows (#3007)
|
2 years ago |
Johannes Gäßler
|
4f7cd6ba9c
CUDA: fix LoRAs (#3130)
|
2 years ago |
Johannes Gäßler
|
89e89599fd
CUDA: fix mul_mat_q not used for output tensor (#3127)
|
2 years ago |
Johannes Gäßler
|
d54a4027a6
CUDA: lower GPU latency + fix Windows performance (#3110)
|
2 years ago |
Jhen-Jie Hong
|
1b0d09259e
cmake : support build for iOS/tvOS (#3116)
|
2 years ago |
Johannes Gäßler
|
8a4ca9af56
CUDA: add device number to error messages (#3112)
|
2 years ago |
Kawrakow
|
f31b6f4e2d
metal : PP speedup (#3084)
|
2 years ago |
Erik Scholz
|
6eeb4d9083
convert: remove most of the n_mult usage in convert.py (#3098)
|
2 years ago |
kchro3
|
21ac3a1503
metal : support for Swift (#3078)
|
2 years ago |
Jhen-Jie Hong
|
4fd5477955
metal : support build for iOS/tvOS (#3089)
|
2 years ago |
takov751
|
ec2a24fedf
flake : add train-text-from-scratch to flake.nix (#3042)
|
2 years ago |
Ikko Eltociear Ashimine
|
7d99aca759
readme : fix typo (#3043)
|
2 years ago |
Kawrakow
|
ba7ffbb251
metal : Q3_K speedup (#2995)
|
2 years ago |
Cebtenzzre
|
e64f5b5578
examples : make n_ctx warning work again (#3066)
|
2 years ago |
Georgi Gerganov
|
94f10b91ed
readme : update hot tpoics
|
2 years ago |
Georgi Gerganov
|
b3e9852e47
sync : ggml (CUDA GLM RoPE + POSIX) (#3082)
|
2 years ago |
Przemysław Pawełczyk
|
cb6c44c5e0
build : do not use _GNU_SOURCE gratuitously (#2035)
|
2 years ago |
hongbo.mo
|
a21baeb122
docker : add git to full-cuda.Dockerfile main-cuda.Dockerfile (#3044)
|
2 years ago |
Yui
|
6ff712a6d1
Update deprecated GGML TheBloke links to GGUF (#3079)
|
2 years ago |
slaren
|
ebc96086af
ggml-alloc : correctly check mmap return value for errors (#3075)
|
2 years ago |
Kunshang Ji
|
7f412dab9c
enable CPU HBM (#2603)
|
2 years ago |
Cebtenzzre
|
6336d834ec
convert : fix F32 ftype not being saved (#3048)
|
2 years ago |
Cebtenzzre
|
00d62adb79
fix some warnings from gcc and clang-tidy (#3038)
|
2 years ago |
Cebtenzzre
|
4fa2cc1750
make : improve test target (#3031)
|
2 years ago |
Cebtenzzre
|
5ffab089a5
make : fix CPPFLAGS (#3035)
|
2 years ago |
slaren
|
15b67a66c2
llama-bench : use two tokens in the warmup run for prompt evals (#3059)
|
2 years ago |