Johannes Gäßler
|
148995e5e5
llama-bench: more compact markdown tables (#7879)
|
1 rok temu |
Georgi Gerganov
|
1442677f92
common : refactor cli arg parsing (#7675)
|
1 rok temu |
Georgi Gerganov
|
554c247caf
ggml : remove OpenCL (#7735)
|
1 rok temu |
slaren
|
adc9ff3841
llama-bench : allow using a different printer for stderr with -oe (#7722)
|
1 rok temu |
Radoslav Gerganov
|
210d99173d
llama-bench : add support for the RPC backend (#7435)
|
1 rok temu |
Georgi Gerganov
|
6ff13987ad
common : normalize naming style (#7462)
|
1 rok temu |
slaren
|
b18532a4ef
phi3 : duplicate rope factors in each layer (#7447)
|
1 rok temu |
slaren
|
e849648888
llama-bench : add pp+tg test type (#7199)
|
1 rok temu |
kunnis
|
628b299106
Adding support for the --numa argument for llama-bench. (#7080)
|
1 rok temu |
Georgi Gerganov
|
9c67c2773d
ggml : add Flash Attention (#5021)
|
1 rok temu |
Justine Tunney
|
8cc91dc63c
ggml : add llamafile sgemm (#6414)
|
1 rok temu |
slaren
|
280345968d
cuda : rename build flag to LLAMA_CUDA (#6299)
|
1 rok temu |
Kawrakow
|
76aa30a263
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)
|
1 rok temu |
slaren
|
2bf8d0f7c4
backend : offload large batches to GPU (#6083)
|
1 rok temu |
slaren
|
b0bc9f4a9d
llama-bench : use random tokens to improve accuracy with mixtral (#6069)
|
1 rok temu |
Steve Grubb
|
6e0438da3c
gguf : fix resource leaks (#6061)
|
1 rok temu |
slaren
|
f30ea47a87
llama : add pipeline parallelism support (#6017)
|
1 rok temu |
Georgi Gerganov
|
6cdabe6526
llama-bench : add embeddings option (#5924)
|
1 rok temu |
Neo Zhang Jianyu
|
715641391d
Support multiple GPUs (split mode) on SYCL backend (#5806)
|
1 rok temu |
Pierrick Hymbert
|
3ab8b3a92e
llama : cleanup unused mmq flags (#5772)
|
1 rok temu |
Georgi Gerganov
|
ab336a9d5e
code : normalize enum names (#5697)
|
1 rok temu |
bmwl
|
f486f6e1e5
ggml : add numa options (#5377)
|
1 rok temu |
Michael Klimenko
|
52bb63c708
refactor : switch to emplace_back to avoid extra object (#5291)
|
1 rok temu |
Neo Zhang Jianyu
|
128dcbd3c9
add --no-mmap in llama-bench (#5257)
|
1 rok temu |
Georgi Gerganov
|
5cb04dbc16
llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)
|
1 rok temu |
Jared Van Bortel
|
e8dc55d006
kompute : llama-bench support and ggml_cpu_has_kompute() (#5226)
|
1 rok temu |
0cc4m
|
2307523d32
ggml : add Vulkan backend (#2059)
|
2 lat temu |
slaren
|
e7e4df031b
llama : ggml-backend integration (#4766)
|
2 lat temu |
slaren
|
226460cc0d
llama-bench : add no-kv-offload parameter (#4812)
|
2 lat temu |
Georgi Gerganov
|
bcc0eb4591
llama : per-layer KV cache + quantum K cache (#4309)
|
2 lat temu |