Evan Miller
|
5656d10599
mpi : add support for distributed inference via MPI (#2099)
|
2 years ago |
dylan
|
84525e7962
docker : add support for CUDA in docker (#1461)
|
2 years ago |
Johannes Gäßler
|
924dd22fd3
Quantized dot products for CUDA mul mat vec (#2067)
|
2 years ago |
Henri Vasserman
|
acc111caf9
Allow old Make to build server. (#2098)
|
2 years ago |
ZhouYuChen
|
23c7c6fc91
Update Makefile: clean simple (#2097)
|
2 years ago |
ningshanwutuobang
|
cfa0750bc9
llama : support input embeddings directly (#1910)
|
2 years ago |
Kawrakow
|
6769e944c7
k-quants : support for super-block size of 64 (#2001)
|
2 years ago |
Johannes Gäßler
|
16b9cd1939
Convert vector to f16 for dequantize mul mat vec (#1913)
|
2 years ago |
Georgi Gerganov
|
ce2c7d72e2
metal : handle buffers larger than device's maxBufferLength (#1826)
|
2 years ago |
Georgi Gerganov
|
b2416493ab
make : do not print help for simple example
|
2 years ago |
DaniAndTheWeb
|
86c7571864
make : update for latest Arch (#1701)
|
2 years ago |
Randall Fitzgerald
|
794db3e7b9
Server Example Refactor and Improvements (#1570)
|
2 years ago |
SuperUserNameMan
|
b41b4cad6f
examples : add "simple" (#1840)
|
2 years ago |
Kawrakow
|
3d01122610
CUDA : faster k-quant dot kernels (#1862)
|
2 years ago |
daboe01
|
cf267d1c71
make : add train-text-from-scratch (#1850)
|
2 years ago |
sandyiscool
|
37e257c48e
make : clean *.so files (#1857)
|
2 years ago |
Kerfuffle
|
74d4cfa343
Allow "quantizing" to f16 and f32 (#1787)
|
2 years ago |
rankaiyx
|
555275a693
make : add SSSE3 compilation use case (#1659)
|
2 years ago |
Georgi Gerganov
|
5c64a0952e
k-quants : allow to optionally disable at compile time (#1734)
|
2 years ago |
Georgi Gerganov
|
2d43387daf
ggml : fix builds, add ggml-quants-k.o (close #1712, close #1710)
|
2 years ago |
Kawrakow
|
99009e72f8
ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684)
|
2 years ago |
Georgi Gerganov
|
ecb217db4f
llama : Metal inference (#1642)
|
2 years ago |
Johannes Gäßler
|
3b126f654f
LLAMA_DEBUG adds debug symbols (#1617)
|
2 years ago |
Kerfuffle
|
0df7d63e5b
Include server in releases + other build system cleanups (#1610)
|
2 years ago |
Johannes Gäßler
|
1fcdcc28b1
cuda : performance optimizations (#1530)
|
2 years ago |
0cc4m
|
2e6cd4b025
OpenCL Token Generation Acceleration (#1459)
|
2 years ago |
Stefan Sydow
|
7780e4f479
make : .PHONY clean (#1553)
|
2 years ago |
Zenix
|
b8ee340abe
feature : support blis and other blas implementation (#1536)
|
2 years ago |
Georgi Gerganov
|
ea600071cb
Revert "feature : add blis and other BLAS implementation support (#1502)"
|
2 years ago |
Zenix
|
07e9ace0f9
feature : add blis and other BLAS implementation support (#1502)
|
2 years ago |