Georgi Gerganov
|
d9d54e498d
speculative : refactor and add a simpler example (#10362)
|
hai 1 ano |
sasha0552
|
42cadc74bd
server : fix slot selection by lru (#10126)
|
hai 1 ano |
sasha0552
|
d865d1478c
server : fix smart selection of available slot (#10120)
|
hai 1 ano |
Georgi Gerganov
|
8d8ff71536
llama : remove Tail-Free sampling (#10071)
|
hai 1 ano |
Georgi Gerganov
|
8125e6cbfc
server : don't overfill the batch during infill (#10018)
|
hai 1 ano |
Xuan Son Nguyen
|
958367bf53
server : refactor slot input data, move tokenizer to HTTP thread (#10023)
|
hai 1 ano |
VoidIsVoid
|
a89f75e1b7
server : handle "logprobs" field with false value (#9871)
|
hai 1 ano |
Georgi Gerganov
|
c7181bd294
server : reuse cached context chunks (#9866)
|
hai 1 ano |
Diego Devesa
|
7eee341bee
common : use common_ prefix for common library functions (#9805)
|
hai 1 ano |
Xuan Son Nguyen
|
458367a906
server : better security control for public deployments (#9776)
|
hai 1 ano |
Georgi Gerganov
|
f4d2b8846a
llama : add reranking support (#9510)
|
hai 1 ano |
Vinesh Janarthanan
|
8a308354f6
server : match OAI structured output response (#9527)
|
hai 1 ano |
Georgi Gerganov
|
6262d13e0b
common : reimplement logging (#9418)
|
hai 1 ano |
Mathijs Henquet
|
78203641fe
server : Add option to return token pieces in /tokenize endpoint (#9108)
|
hai 1 ano |
Xuan Son Nguyen
|
6e7d133a5f
server : refactor multitask handling (#9274)
|
hai 1 ano |
ardfork
|
978ba3d83d
Server: Don't ignore llama.cpp params (#8754)
|
hai 1 ano |
Georgi Gerganov
|
4e24cffd8c
server : handle content array in chat API (#8449)
|
hai 1 ano |
Xuan Son Nguyen
|
48e6b92cc3
Add chat template support for llama-cli (#8068)
|
hai 1 ano |
sasha0552
|
7a16ce7db2
server : smart slot selection using Longest Common Prefix (#7728)
|
hai 1 ano |
Georgi Gerganov
|
1442677f92
common : refactor cli arg parsing (#7675)
|
hai 1 ano |
Benjamin Findley
|
e586ee4259
change default temperature of OAI compat API from 0 to 1 (#7226)
|
hai 1 ano |
Johannes Gäßler
|
c12452c7ae
JSON: [key] -> .at(key), assert() -> GGML_ASSERT (#7143)
|
hai 1 ano |
Xuan Son Nguyen
|
1fd9c1741d
clean up json_value & server_log (#7142)
|
hai 1 ano |
Pedro Cuenca
|
b97bc3966e
llama : support Llama 3 HF conversion (#6745)
|
hai 1 ano |
Pierrick Hymbert
|
75cd4c7729
ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495)
|
hai 1 ano |
JH23X
|
60cdf40cc3
server : handle exception on wrong type in request (#6452)
|
hai 1 ano |
Xuan Son Nguyen
|
ad3a0505e3
Server: clean up OAI params parsing function (#6284)
|
hai 1 ano |
Pierrick Hymbert
|
1b26aebe4d
server: flush stdout after logging in both text and json layout (#6253)
|
hai 1 ano |
Olivier Chafik
|
72114edf06
json-schema-to-grammar : fix order of props + non-str const/enum (#6232)
|
hai 1 ano |
Olivier Chafik
|
5b7b0ac8df
json-schema-to-grammar improvements (+ added to server) (#5978)
|
hai 1 ano |