Georgi Gerganov
|
bcc0eb4591
llama : per-layer KV cache + quantum K cache (#4309)
|
2 gadi atpakaļ |
Georgi Gerganov
|
05cd6e5036
server : recognize cache_prompt parameter in OAI API (#4347)
|
2 gadi atpakaļ |
Ed Lee
|
33e171d1e9
server : fix OpenAI API `stop` field to be optional (#4299)
|
2 gadi atpakaļ |
Georgi Gerganov
|
d5a1cbde60
llama : support optional tensors (#4283)
|
2 gadi atpakaļ |
Ziad Ben Hadj-Alouane
|
1d144112c0
server : add --log-disable to disable logging to file (#4260)
|
2 gadi atpakaļ |
Ziad Ben Hadj-Alouane
|
f43f09366d
server : add single-client multi-prompt support (#4232)
|
2 gadi atpakaļ |
Georgi Gerganov
|
af19d35734
server : OAI API compatibility (#4198)
|
2 gadi atpakaļ |
Haohui Mai
|
55978ce09b
Fix incorrect format strings and uninitialized variables. (#4133)
|
2 gadi atpakaļ |
SoftwareRenderer
|
936c79b227
server : relay error messages (#4131)
|
2 gadi atpakaļ |
Kerfuffle
|
91f6499393
Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040)
|
2 gadi atpakaļ |
Alexey Parfenov
|
d96ca7ded7
server : fix crash when prompt exceeds context size (#3996)
|
2 gadi atpakaļ |
Mihai
|
57ad015dc3
server : add min_p param (#3877)
|
2 gadi atpakaļ |
cebtenzzre
|
b12fa0d1c1
build : link against build info instead of compiling against it (#3879)
|
2 gadi atpakaļ |
cebtenzzre
|
898aeca90a
llama : implement YaRN RoPE scaling (#2268)
|
2 gadi atpakaļ |
Adrian Hesketh
|
ca190bca8e
server : re-enable completion and embedded at the same time (#3876)
|
2 gadi atpakaļ |
Kerfuffle
|
6e08281e58
Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843)
|
2 gadi atpakaļ |
Georgi Gerganov
|
34b2a5e1ee
server : do not release slot on image input (#3798)
|
2 gadi atpakaļ |
cebtenzzre
|
ad93962657
server : add parameter -tb N, --threads-batch N (#3584) (#3768)
|
2 gadi atpakaļ |
Georgi Gerganov
|
1717521cdb
server : do not block system prompt update (#3767)
|
2 gadi atpakaļ |
Marcus Dunn
|
5be6c803fa
llama : remove token functions with `context` args in favor of `model` (#3720)
|
2 gadi atpakaļ |
Georgi Gerganov
|
438c2ca830
server : parallel decoding and multimodal (#3677)
|
2 gadi atpakaļ |
Georgi Gerganov
|
d1031cf49c
sampling : refactor init to use llama_sampling_params (#3696)
|
2 gadi atpakaļ |
Georgi Gerganov
|
a0edf73bda
server : fix uninitialized sampling context (close #3685)
|
2 gadi atpakaļ |
Georgi Gerganov
|
0e89203b51
speculative : add tree-based sampling example (#3624)
|
2 gadi atpakaļ |
Georgi Gerganov
|
57dd55e2c7
server : fix kv cache management (#3588)
|
2 gadi atpakaļ |
Michael Coppola
|
a8bdd65525
server : add parameter -tb N, --threads-batch N (#3584)
|
2 gadi atpakaļ |
Kerfuffle
|
70c29da118
common : fix mirostat state when using multiple sequences (#3543)
|
2 gadi atpakaļ |
vvhg1
|
11ea5c7d96
infill. : fix tokenization (#3508)
|
2 gadi atpakaļ |
Jhen-Jie Hong
|
97af49fa39
server : reuse llama_sample_token common util (#3494)
|
2 gadi atpakaļ |
Kenvix ⭐
|
45eba9369f
build : use std::make_tuple() for compatibility with older GCC versions (#3488)
|
2 gadi atpakaļ |