Georgi Gerganov
|
b3964c1e89
metal : optimize FA vec for large sequences and BS <= 8 (#15566)
|
4 months ago |
Georgi Gerganov
|
6b64f74b55
batched-bench : fix unified KV cache handling + pp timing (#15562)
|
4 months ago |
Georgi Gerganov
|
f0d3c7405c
batched-bench : use rand tokens (#15398)
|
5 months ago |
Georgi Gerganov
|
225e7a1438
llama : add high-throughput mode (#14363)
|
6 months ago |
Georgi Gerganov
|
745aa5319b
llama : deprecate llama_kv_self_ API (#14030)
|
7 months ago |
Georgi Gerganov
|
b89d605a91
batched-bench : fix pp batch contents (#13492)
|
8 months ago |
Diego Devesa
|
1d36b3670b
llama : move end-user examples to tools directory (#13249)
|
8 months ago |