Reza Kakhki
|
9ba399dfa7
server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967)
|
1 год назад |
NeverLucky
|
09fe2e7613
server: allow filtering llama server response fields (#10940)
|
1 год назад |
Xuan Son Nguyen
|
485dc01214
server : add system_fingerprint to chat/completion (#10917)
|
1 год назад |
Xuan Son Nguyen
|
57bb2c40cd
server : fix logprobs, make it OAI-compatible (#10783)
|
1 год назад |
Xuan Son Nguyen
|
46828872c3
server : (embeddings) using same format for "input" and "content" (#10872)
|
1 год назад |
krystiancha
|
05c3a444b8
server : fill usage info in embeddings and rerank responses (#10852)
|
1 год назад |
Michelle Tan
|
89d604f2c8
server: Fix `has_next_line` in JSON response (#10818)
|
1 год назад |
kallewoof
|
484d2f31ae
bug-fix: snprintf prints NULL in place of the last character (#10419)
|
1 год назад |
Xuan Son Nguyen
|
3573fa8e7b
server : (refactor) no more json in server_task input (#10691)
|
1 год назад |
Georgi Gerganov
|
ce4a7b8493
server : various fixes (#10704)
|
1 год назад |
Xuan Son Nguyen
|
6c5bc0625f
server : (refactoring) do not rely on JSON internally (#10643)
|
1 год назад |
haopeng
|
64ed2091b2
server: Add "tokens per second" information in the backend (#10548)
|
1 год назад |
Georgi Gerganov
|
d9d54e498d
speculative : refactor and add a simpler example (#10362)
|
1 год назад |
sasha0552
|
42cadc74bd
server : fix slot selection by lru (#10126)
|
1 год назад |
sasha0552
|
d865d1478c
server : fix smart selection of available slot (#10120)
|
1 год назад |
Georgi Gerganov
|
8d8ff71536
llama : remove Tail-Free sampling (#10071)
|
1 год назад |
Georgi Gerganov
|
8125e6cbfc
server : don't overfill the batch during infill (#10018)
|
1 год назад |
Xuan Son Nguyen
|
958367bf53
server : refactor slot input data, move tokenizer to HTTP thread (#10023)
|
1 год назад |
VoidIsVoid
|
a89f75e1b7
server : handle "logprobs" field with false value (#9871)
|
1 год назад |
Georgi Gerganov
|
c7181bd294
server : reuse cached context chunks (#9866)
|
1 год назад |
Diego Devesa
|
7eee341bee
common : use common_ prefix for common library functions (#9805)
|
1 год назад |
Xuan Son Nguyen
|
458367a906
server : better security control for public deployments (#9776)
|
1 год назад |
Georgi Gerganov
|
f4d2b8846a
llama : add reranking support (#9510)
|
1 год назад |
Vinesh Janarthanan
|
8a308354f6
server : match OAI structured output response (#9527)
|
1 год назад |
Georgi Gerganov
|
6262d13e0b
common : reimplement logging (#9418)
|
1 год назад |
Mathijs Henquet
|
78203641fe
server : Add option to return token pieces in /tokenize endpoint (#9108)
|
1 год назад |
Xuan Son Nguyen
|
6e7d133a5f
server : refactor multitask handling (#9274)
|
1 год назад |
ardfork
|
978ba3d83d
Server: Don't ignore llama.cpp params (#8754)
|
1 год назад |
Georgi Gerganov
|
4e24cffd8c
server : handle content array in chat API (#8449)
|
1 год назад |
Xuan Son Nguyen
|
48e6b92cc3
Add chat template support for llama-cli (#8068)
|
1 год назад |