Mathijs Henquet
|
78203641fe
server : Add option to return token pieces in /tokenize endpoint (#9108)
|
hai 1 ano |
Xuan Son Nguyen
|
bfe76d4a17
common : move arg parser code to `arg.cpp` (#9388)
|
hai 1 ano |
Xuan Son Nguyen
|
1b9ae5189c
common : refactor arg parser (#9308)
|
hai 1 ano |
Georgi Gerganov
|
df270ef745
llama : refactor sampling v2 (#9294)
|
hai 1 ano |
Xuan Son Nguyen
|
a77feb5d71
server : add some missing env variables (#9116)
|
hai 1 ano |
Xuan Son Nguyen
|
fc54ef0d1c
server : support reading arguments from environment variables (#9105)
|
hai 1 ano |
Xuan Son Nguyen
|
8b3befc0e2
server : refactor middleware and /health endpoint (#9056)
|
hai 1 ano |
Xuan Son Nguyen
|
1e6f6554aa
server : add lora hotswap endpoint (WIP) (#8857)
|
hai 1 ano |
Igor Okulist
|
afbbcf3c04
server : update llama-server embedding flag documentation (#8779)
|
hai 1 ano |
Ujjawal Panchal
|
4b0eff3df5
docs : Quantum -> Quantized (#8666)
|
hai 1 ano |
Jan Boon
|
628154492a
server : update doc to clarify n_keep when there is bos token (#8619)
|
hai 1 ano |
Xuan Son Nguyen
|
4db8f60fe7
fix ci (#8494)
|
hai 1 ano |
M-A
|
f17f39ff9c
server: update README.md with llama-server --help output [no ci] (#8472)
|
hai 1 ano |
Bjarke Viksøe
|
cb4d86c4d7
server: Retrieve prompt template in /props (#8337)
|
hai 1 ano |
Pieter Ouwerkerk
|
5a7447c569
readme : fix minor typos [no ci] (#8314)
|
hai 1 ano |
Sigbjørn Skjæret
|
38373cfbab
Add SPM infill support (#8016)
|
hai 1 ano |
Olivier Chafik
|
1c641e6aac
`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809)
|
hai 1 ano |
Johannes Gäßler
|
7027b27d76
server: update cache_prompt documentation [no ci] (#7745)
|
hai 1 ano |
Johannes Gäßler
|
1b01f06db0
server: add test for token probs (#7347)
|
hai 1 ano |
Johannes Gäßler
|
cb42c29427
server: correct --threads documentation [no ci] (#7362)
|
hai 1 ano |
Leon Knauer
|
9c4fdcbec8
[Server] Added --verbose option to README [no ci] (#7335)
|
hai 1 ano |
Ryuei
|
27f65d6267
docs: Fix typo and update description for --embeddings flag (#7026)
|
hai 1 ano |
Johan
|
911b3900dd
server : add_special option for tokenize endpoint (#7059)
|
hai 1 ano |
Johannes Gäßler
|
af0a5b6163
server: fix incorrectly reported token probabilities (#7125)
|
hai 1 ano |
Kyle Mistele
|
260b7c6529
server : update readme with undocumented options (#7013)
|
hai 1 ano |
Olivier Chafik
|
b8a7a5a90f
build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964)
|
hai 1 ano |
Olivier Chafik
|
ab9a3240a9
JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555)
|
hai 1 ano |
Jan Boon
|
beea6e1b16
llama : save and restore kv cache for single seq id (#6341)
|
hai 1 ano |
Georgi Gerganov
|
4399f13fb9
server : remove obsolete --memory-f32 option
|
hai 1 ano |
Fattire
|
5fb1574c81
A few small fixes to server's README docs (#6428)
|
hai 1 ano |