Johannes Gäßler
|
0728c5a8b9
CUDA: mmq CLI option, fixed mmq build issues (#2453)
|
2 anni fa |
slaren
|
a113689571
ggml : add graph tensor allocator (#2411)
|
2 anni fa |
Johannes Gäßler
|
11f3ca06b8
CUDA: Quantized matrix matrix multiplication (#2160)
|
2 anni fa |
Cebtenzzre
|
6df1f5940f
make : build with -Wmissing-prototypes (#2394)
|
2 anni fa |
Aarni Koskela
|
b3f138d058
Chat UI extras (#2366)
|
2 anni fa |
Evan Jones
|
84e09a7d8b
llama : add grammar-based sampling (#1773)
|
2 anni fa |
Jose Maldonado
|
91171b8072
make : fix CLBLAST compile support in FreeBSD (#2331)
|
2 anni fa |
Jose Maldonado
|
73643f5fb1
gitignore : changes for Poetry users + chat examples (#2284)
|
2 anni fa |
Georgi Gerganov
|
a814d04f81
make : fix indentation
|
2 anni fa |
Sky Yan
|
42c7c2e2e9
make : support customized LLAMA_CUDA_NVCC and LLAMA_CUDA_CCBIN (#2275)
|
2 anni fa |
Jiří Podivín
|
54e3bc76fe
make : add new target for test binaries (#2244)
|
2 anni fa |
Przemysław Pawełczyk
|
9cf022a188
make : fix embdinput library and server examples building on MSYS2 (#2235)
|
2 anni fa |
wzy
|
7dabc66f3c
make : use pkg-config for OpenBLAS (#2222)
|
2 anni fa |
James Reynolds
|
229aab351c
make : fix combination of LLAMA_METAL and LLAMA_MPI (#2208)
|
2 anni fa |
Evan Miller
|
5656d10599
mpi : add support for distributed inference via MPI (#2099)
|
2 anni fa |
dylan
|
84525e7962
docker : add support for CUDA in docker (#1461)
|
2 anni fa |
Johannes Gäßler
|
924dd22fd3
Quantized dot products for CUDA mul mat vec (#2067)
|
2 anni fa |
Henri Vasserman
|
acc111caf9
Allow old Make to build server. (#2098)
|
2 anni fa |
ZhouYuChen
|
23c7c6fc91
Update Makefile: clean simple (#2097)
|
2 anni fa |
ningshanwutuobang
|
cfa0750bc9
llama : support input embeddings directly (#1910)
|
2 anni fa |
Kawrakow
|
6769e944c7
k-quants : support for super-block size of 64 (#2001)
|
2 anni fa |
Johannes Gäßler
|
16b9cd1939
Convert vector to f16 for dequantize mul mat vec (#1913)
|
2 anni fa |
Georgi Gerganov
|
ce2c7d72e2
metal : handle buffers larger than device's maxBufferLength (#1826)
|
2 anni fa |
Georgi Gerganov
|
b2416493ab
make : do not print help for simple example
|
2 anni fa |
DaniAndTheWeb
|
86c7571864
make : update for latest Arch (#1701)
|
2 anni fa |
Randall Fitzgerald
|
794db3e7b9
Server Example Refactor and Improvements (#1570)
|
2 anni fa |
SuperUserNameMan
|
b41b4cad6f
examples : add "simple" (#1840)
|
2 anni fa |
Kawrakow
|
3d01122610
CUDA : faster k-quant dot kernels (#1862)
|
2 anni fa |
daboe01
|
cf267d1c71
make : add train-text-from-scratch (#1850)
|
2 anni fa |
sandyiscool
|
37e257c48e
make : clean *.so files (#1857)
|
2 anni fa |