Kawrakow
|
d924522a46
Custom RoPE + bettter memory management for CUDA (#2295)
|
2 years ago |
Georgi Gerganov
|
ae178ab46b
llama : make tensor_split ptr instead of array (#2272)
|
2 years ago |
Jiahao Li
|
7568d1a2b2
Support dup & cont ops on CUDA (#2242)
|
2 years ago |
Bach Le
|
7cdd30bf1f
cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer (#2220)
|
2 years ago |
Jiahao Li
|
206e01de11
cuda : support broadcast add & mul (#2192)
|
2 years ago |
Johannes Gäßler
|
4304bd3cde
CUDA: mul_mat_vec_q kernels for k-quants (#2203)
|
2 years ago |
Georgi Gerganov
|
697966680b
ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope)
|
2 years ago |
Howard Su
|
ff5d58faec
Fix compile error on Windows CUDA (#2207)
|
2 years ago |
Georgi Gerganov
|
680e6f9177
cuda : add gelu support
|
2 years ago |
Johannes Gäßler
|
2b5eb72e10
Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189)
|
2 years ago |
Georgi Gerganov
|
f7d278faf3
ggml : revert CUDA broadcast changes from #2183 (#2191)
|
2 years ago |
Georgi Gerganov
|
20d7740a9b
ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183)
|
2 years ago |
Spencer Sutton
|
5bf2a27718
ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)
|
2 years ago |
Johannes Gäßler
|
64639555ff
Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144)
|
2 years ago |
Johannes Gäßler
|
061f5f8d21
CUDA: add __restrict__ to mul mat vec kernels (#2140)
|
2 years ago |
Johannes Gäßler
|
924dd22fd3
Quantized dot products for CUDA mul mat vec (#2067)
|
2 years ago |
Howard Su
|
cc45a7feb8
Fix crash of test-tokenizer-0 under Debug build (#2064)
|
2 years ago |
Johannes Gäßler
|
0bc2cdfc87
Better CUDA synchronization logic (#2057)
|
2 years ago |
Salvador E. Tropea
|
5b351e94d0
cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028)
|
2 years ago |
Salvador E. Tropea
|
6432aabb6d
cuda : fix missing const qualifier in casts (#2027)
|
2 years ago |
Johannes Gäßler
|
7f9753fa12
CUDA GPU acceleration for LoRAs + f16 models (#1970)
|
2 years ago |
Kawrakow
|
6769e944c7
k-quants : support for super-block size of 64 (#2001)
|
2 years ago |
Howard Su
|
cbebf61ca7
Fix assert when free invalid cuda pointer (#2005)
|
2 years ago |
Robyn
|
5ec8dd5a3c
#1869 Fix null reference errors when training from scratch with CUDA (#1907)
|
2 years ago |
Kawrakow
|
ca7c3f4da5
cuda : faster k-quants on older GPUs (#1930)
|
2 years ago |
Johannes Gäßler
|
16b9cd1939
Convert vector to f16 for dequantize mul mat vec (#1913)
|
2 years ago |
Johannes Gäßler
|
2c9380dd2f
Only one CUDA stream per device for async compute (#1898)
|
2 years ago |
Howard Su
|
3d59ec5935
ggml : fix warnings under MSVC (#1908)
|
2 years ago |
Kawrakow
|
3d01122610
CUDA : faster k-quant dot kernels (#1862)
|
2 years ago |
Johannes Gäßler
|
a09f9195be
Fixed CUDA runtime version check (#1879)
|
2 years ago |