Pierrick Hymbert
|
f482bb2e49
common: llama_load_model_from_url split support (#6192)
|
1 ano atrás |
Pierrick Hymbert
|
1997577d5e
server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` (#6254)
|
1 ano atrás |
Julius Arkenberg
|
476b0251b2
llama : add grok-1 support (#6204)
|
1 ano atrás |
Pierrick Hymbert
|
21cad01b6e
split: add gguf-split in the make build target (#6262)
|
1 ano atrás |
Pierrick Hymbert
|
1b26aebe4d
server: flush stdout after logging in both text and json layout (#6253)
|
1 ano atrás |
Johannes Gäßler
|
50ccaf5eac
lookup: complement data from context with general text statistics (#5479)
|
1 ano atrás |
Georgi Gerganov
|
56a00f0a2f
common : default --hf-file to --model (#6234)
|
1 ano atrás |
fraxy-v
|
92397d87a4
convert-llama2c-to-ggml : enable conversion of GQA models (#6237)
|
1 ano atrás |
Kawrakow
|
1d0331c12a
quantize: options for output and token embedding tensors qtype (#6239)
|
1 ano atrás |
Pierrick Hymbert
|
dba1af6129
llama_model_loader: support multiple split/shard GGUFs (#6187)
|
1 ano atrás |
Minsoo Cheong
|
ee804f6223
ci: apply concurrency limit for github workflows (#6243)
|
1 ano atrás |
Georgi Gerganov
|
80bd33bc2c
common : add HF arg helpers (#6234)
|
1 ano atrás |
Nexesenex
|
e80f06d2a1
llama : correction of the attn.v.weight quantization for IQ3_XS (#6209)
|
1 ano atrás |
Olivier Chafik
|
f77a8ffd3b
tests : conditional python & node json schema tests (#6207)
|
1 ano atrás |
Olivier Chafik
|
72114edf06
json-schema-to-grammar : fix order of props + non-str const/enum (#6232)
|
1 ano atrás |
slaren
|
2f0e81e053
cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208)
|
1 ano atrás |
Xiaoyi Chen
|
29ab270e65
readme : add RecurseChat to the list of UIs (#6219)
|
1 ano atrás |
Jan Boon
|
6b8bb3a31d
server : fix n_keep always showing as 0 in response (#6211)
|
1 ano atrás |
Georgi Gerganov
|
68e210b354
server : enable continuous batching by default (#6231)
|
1 ano atrás |
Georgi Gerganov
|
b3e94f26ba
metal : proper assert for mat-mat memory alignment (#6225)
|
1 ano atrás |
Vaibhav Srivastav
|
b2075fd6a5
ci : add CURL flag for the mac builds (#6214)
|
1 ano atrás |
Georgi Gerganov
|
95d576b48e
metal : pad n_ctx by 32 (#6177)
|
1 ano atrás |
Neo Zhang Jianyu
|
59c17f02de
add blog link (#6222)
|
1 ano atrás |
DAN™
|
fa046eafbc
Fix params underscore convert to dash. (#6203)
|
1 ano atrás |
Jan Boon
|
be07a03217
server : update readme doc from `slot_id` to `id_slot` (#6213)
|
1 ano atrás |
slaren
|
d0a71233fb
cuda : disable host register by default (#6206)
|
1 ano atrás |
semidark
|
f372c49ccd
Corrected typo to wrong file (#6199)
|
1 ano atrás |
Georgi Gerganov
|
924ce1dce7
tests : disable system() calls (#6198)
|
1 ano atrás |
slaren
|
03a8f8fafe
cuda : fix LLAMA_CUDA_F16 build (#6197)
|
1 ano atrás |
Kawrakow
|
cfd3be76e3
ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196)
|
1 ano atrás |