slaren
|
2f34b865b6
cuda : fix LLAMA_CUDA_F16 build (#6298)
|
1 year ago |
slaren
|
ae1f211ce2
cuda : refactor into multiple files (#6269)
|
1 year ago |
Xuan Son Nguyen
|
ad3a0505e3
Server: clean up OAI params parsing function (#6284)
|
1 year ago |
Neo Zhang Jianyu
|
95ad616cdd
[SYCL] fix SYCL backend build on windows is break by LOG() error (#6290)
|
1 year ago |
Minsoo Cheong
|
64e7b47c69
examples : add "retrieval" (#6193)
|
1 year ago |
Justine Tunney
|
7733f0c760
ggml : support AVX512VNNI (#6280)
|
1 year ago |
Rick G
|
a32b77c4b2
Fix heap corruption from wmode out-of-bound writes on windows (#6272)
|
1 year ago |
Georgi Gerganov
|
a0e584defd
imatrix : fix wname for mul_mat_id ops (#6271)
|
1 year ago |
Johannes Gäßler
|
7aed0ffe68
Fixed lookup compilation issues on Windows (#6273)
|
1 year ago |
Pierrick Hymbert
|
ea279d5609
ci : close inactive issue, increase operations per run (#6270)
|
1 year ago |
Minsoo Cheong
|
586e7bc561
sampling : deduplicated code for probability distribution access (#6240)
|
1 year ago |
Meng, Hengyu
|
ddf6568510
[SYCL] offload op (#6217)
|
1 year ago |
Neo Zhang Jianyu
|
d03224ac98
Support build win release for SYCL (#6241)
|
1 year ago |
Jared Van Bortel
|
94d1b3b411
use _wfopen instead of fopen on Windows (#6248)
|
1 year ago |
Georgi Gerganov
|
95562175f8
gitignore : gguf-split
|
1 year ago |
Pierrick Hymbert
|
f482bb2e49
common: llama_load_model_from_url split support (#6192)
|
1 year ago |
Pierrick Hymbert
|
1997577d5e
server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` (#6254)
|
1 year ago |
Julius Arkenberg
|
476b0251b2
llama : add grok-1 support (#6204)
|
1 year ago |
Pierrick Hymbert
|
21cad01b6e
split: add gguf-split in the make build target (#6262)
|
1 year ago |
Pierrick Hymbert
|
1b26aebe4d
server: flush stdout after logging in both text and json layout (#6253)
|
1 year ago |
Johannes Gäßler
|
50ccaf5eac
lookup: complement data from context with general text statistics (#5479)
|
1 year ago |
Georgi Gerganov
|
56a00f0a2f
common : default --hf-file to --model (#6234)
|
1 year ago |
fraxy-v
|
92397d87a4
convert-llama2c-to-ggml : enable conversion of GQA models (#6237)
|
1 year ago |
Kawrakow
|
1d0331c12a
quantize: options for output and token embedding tensors qtype (#6239)
|
1 year ago |
Pierrick Hymbert
|
dba1af6129
llama_model_loader: support multiple split/shard GGUFs (#6187)
|
1 year ago |
Minsoo Cheong
|
ee804f6223
ci: apply concurrency limit for github workflows (#6243)
|
1 year ago |
Georgi Gerganov
|
80bd33bc2c
common : add HF arg helpers (#6234)
|
1 year ago |
Nexesenex
|
e80f06d2a1
llama : correction of the attn.v.weight quantization for IQ3_XS (#6209)
|
1 year ago |
Olivier Chafik
|
f77a8ffd3b
tests : conditional python & node json schema tests (#6207)
|
1 year ago |
Olivier Chafik
|
72114edf06
json-schema-to-grammar : fix order of props + non-str const/enum (#6232)
|
1 year ago |