slaren
|
f30ea47a87
llama : add pipeline parallelism support (#6017)
|
1 anno fa |
Georgi Gerganov
|
6cdabe6526
llama-bench : add embeddings option (#5924)
|
1 anno fa |
Neo Zhang Jianyu
|
715641391d
Support multiple GPUs (split mode) on SYCL backend (#5806)
|
1 anno fa |
Pierrick Hymbert
|
3ab8b3a92e
llama : cleanup unused mmq flags (#5772)
|
1 anno fa |
Georgi Gerganov
|
ab336a9d5e
code : normalize enum names (#5697)
|
1 anno fa |
bmwl
|
f486f6e1e5
ggml : add numa options (#5377)
|
1 anno fa |
Michael Klimenko
|
52bb63c708
refactor : switch to emplace_back to avoid extra object (#5291)
|
1 anno fa |
Neo Zhang Jianyu
|
128dcbd3c9
add --no-mmap in llama-bench (#5257)
|
1 anno fa |
Georgi Gerganov
|
5cb04dbc16
llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)
|
1 anno fa |
Jared Van Bortel
|
e8dc55d006
kompute : llama-bench support and ggml_cpu_has_kompute() (#5226)
|
1 anno fa |
0cc4m
|
2307523d32
ggml : add Vulkan backend (#2059)
|
1 anno fa |
slaren
|
e7e4df031b
llama : ggml-backend integration (#4766)
|
2 anni fa |
slaren
|
226460cc0d
llama-bench : add no-kv-offload parameter (#4812)
|
2 anni fa |
Georgi Gerganov
|
bcc0eb4591
llama : per-layer KV cache + quantum K cache (#4309)
|
2 anni fa |
cebtenzzre
|
b12fa0d1c1
build : link against build info instead of compiling against it (#3879)
|
2 anni fa |
Kerfuffle
|
6e08281e58
Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843)
|
2 anni fa |
Marcus Dunn
|
5be6c803fa
llama : remove token functions with `context` args in favor of `model` (#3720)
|
2 anni fa |
Cebtenzzre
|
bc39553c90
build : enable more non-default compiler warnings (#3200)
|
2 anni fa |
slaren
|
16bc66d947
llama.cpp : split llama_context_params into model and context params (#3301)
|
2 anni fa |
Georgi Gerganov
|
ec893798b7
llama : custom attention mask + parallel decoding + no context swaps (#3228)
|
2 anni fa |
Rickard Hallerbäck
|
dc6897404e
metal : reusing llama.cpp logging (#3152)
|
2 anni fa |
Georgi Gerganov
|
8c00b7a6ff
sync : ggml (Metal F32 support + reduce ggml-alloc size) (#3192)
|
2 anni fa |
slaren
|
15b67a66c2
llama-bench : use two tokens in the warmup run for prompt evals (#3059)
|
2 anni fa |
Cebtenzzre
|
de2fe892af
examples : replace fprintf to stdout with printf (#3017)
|
2 anni fa |
Cebtenzzre
|
3103568144
llama-bench : make cpp file non-executable (#2999)
|
2 anni fa |
slaren
|
43033b7bb4
llama-bench : set locale to utf8 (#2832)
|
2 anni fa |
slaren
|
154725c543
llama-bench : add model sizes (#2771)
|
2 anni fa |
Henri Vasserman
|
6bbc598a63
ROCm Port (#1087)
|
2 anni fa |
slaren
|
8e4364f2af
llama-bench : minor fixes (#2695)
|
2 anni fa |
Georgi Gerganov
|
6381d4e110
gguf : new file format with flexible meta data (beta) (#2398)
|
2 anni fa |