slaren
|
8ca511cade
cuda : fix LLAMA_CUDA_F16 (#5262)
|
2 years ago |
Ali Nehzat
|
d71ac90985
make : generate .a library for static linking (#5205)
|
2 years ago |
Guoteng
|
ce32060198
llama : support InternLM2 (#5184)
|
2 years ago |
Eve
|
1cfb5372cf
Fix broken Vulkan Cmake (properly) (#5230)
|
2 years ago |
Georgi Gerganov
|
d3bac7d584
llama : reorder build_orion() at correct place (#5118)
|
2 years ago |
Georgi Gerganov
|
5cb04dbc16
llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)
|
2 years ago |
Georgi Gerganov
|
efb7bdbbd0
metal : add im2col F32 dst support (#5132)
|
2 years ago |
JidongZhang-THU
|
15606309a0
llava : add MobileVLM support (#5132)
|
2 years ago |
Neo Zhang Jianyu
|
b2b9f025e7
format license text, restore apache license by legal suggestion (#5233)
|
2 years ago |
slaren
|
dabcc5b471
ggml : limit n_threads to the max n_tasks (#5238)
|
2 years ago |
0cc4m
|
f8e9140cb4
Vulkan Fixes (#5223)
|
2 years ago |
Yiming Cui
|
d62520eb2c
Fix typos of IQ2_XXS and IQ3_XXS in llama.cpp (#5231)
|
2 years ago |
Neo Zhang Jianyu
|
01684139c3
support SYCL backend windows build (#5208)
|
2 years ago |
Jared Van Bortel
|
e8dc55d006
kompute : llama-bench support and ggml_cpu_has_kompute() (#5226)
|
2 years ago |
Georgi Gerganov
|
e0085fdf7c
Revert "server : change deps.sh xxd files to string literals (#5221)"
|
2 years ago |
Georgi Gerganov
|
e6f291d158
server : fix context shift (#5195)
|
2 years ago |
JohnnyB
|
4003be0e5f
server : change deps.sh xxd files to string literals (#5221)
|
2 years ago |
Kawrakow
|
fea4fd4ba7
ggml : fix IQ3_XXS on Metal (#5219)
|
2 years ago |
Georgi Gerganov
|
8f8ddfcfad
sync : ggml (#0)
|
2 years ago |
Georgi Gerganov
|
6fb50ebbf0
gguf : fix comparison (ggml/715)
|
2 years ago |
John Balis
|
625a699b54
`ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686)
|
2 years ago |
Georgi Gerganov
|
a4b07c057a
gguf : add input validation, prevent integer overflows (ggml/709)
|
2 years ago |
Georgi Gerganov
|
549a1e6cd5
ci : fix yolo URLs + fix metal capture (ggml/712)
|
2 years ago |
Jack Mousseau
|
5f14ee0b0c
metal : add debug capture backend function (ggml/694)
|
2 years ago |
Kawrakow
|
8e14e3ddb3
Faster AVX2 dot product for IQ2_XS (#5187)
|
2 years ago |
Kawrakow
|
f4d7e54974
SOTA 3-bit quants (#5196)
|
2 years ago |
0cc4m
|
2256f36b79
Vulkan Windows APU Memory Handling (#5199)
|
2 years ago |
Vladimir Malyutin
|
7359016c7c
quantize : fix typo (#5211)
|
2 years ago |
divinity76
|
813416991a
main : allow empty --prompt-cache file (#5176)
|
2 years ago |
Romain Neutron
|
5589921ef8
readme : minor (#5204)
|
2 years ago |