Georgi Gerganov
|
bcc0eb4591
llama : per-layer KV cache + quantum K cache (#4309)
|
2 лет назад |
Hongyu Ouyang
|
81bc9214a3
train : fix #4227 (double free in examples/train-text-from-scratch/train-text-from-scratch.cpp) (#4351)
|
2 лет назад |
Georgi Gerganov
|
05cd6e5036
server : recognize cache_prompt parameter in OAI API (#4347)
|
2 лет назад |
Georgi Gerganov
|
caa9249217
common : fix compile warning
|
2 лет назад |
stduhpf
|
da5eaef1f3
speculative : support `--color` (#4343)
|
2 лет назад |
Marcus Dunn
|
5f6e0c0dff
grammar : pre-computed pieces + reserve mem + less string copies (#4330)
|
2 лет назад |
Kerfuffle
|
5aa365d88f
llama : allow overriding GGUF metadata when loading model (#4092)
|
2 лет назад |
MaggotHATE
|
52c8bc3cf3
sampling : custom samplers order (#4285)
|
2 лет назад |
kchro3
|
e4b76bbe31
swift : revert compiler checks for swift package (#4332)
|
2 лет назад |
Daniel Bevenius
|
23b5e12eb5
simple : update error message for KV cache check (#4324)
|
2 лет назад |
Miwa / Ensan
|
d208995c6d
swift : fix concatenation method to avoid invalid UTF8 stringfication (#4325)
|
2 лет назад |
Miwa / Ensan
|
5c9f90cba1
swift : fix prompt tokenization logic (#4321)
|
2 лет назад |
Ikko Eltociear Ashimine
|
4fa44e84ad
grammar-parser : fix typo (#4318)
|
2 лет назад |
Georgi Gerganov
|
fbbc42827b
ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() (#4308)
|
2 лет назад |
Georgi Gerganov
|
adf3de4f69
ggml : fix soft max out-of-bounds access (#4307)
|
2 лет назад |
Ed Lee
|
33e171d1e9
server : fix OpenAI API `stop` field to be optional (#4299)
|
2 лет назад |
Rickard Edén
|
6949b50df5
py : add grammar to oai like api (#4294)
|
2 лет назад |
Georgi Gerganov
|
d7b800b8bc
llama : pad KV cache size (#4280)
|
2 лет назад |
Georgi Gerganov
|
5a7d3125e7
llama : avoid using "optional" keyword (#4283)
|
2 лет назад |
Georgi Gerganov
|
d5a1cbde60
llama : support optional tensors (#4283)
|
2 лет назад |
Miwa / Ensan
|
b220222a64
swift : fix token_to_piece implementation (#4278)
|
2 лет назад |
Jared Van Bortel
|
511f52c334
build : enable libstdc++ assertions for debug builds (#4275)
|
2 лет назад |
CausalLM
|
03562f3a86
llama : support attention bias on LLaMA architecture (#4283)
|
2 лет назад |
Shijie
|
37c746d687
llama : add Qwen support (#4281)
|
2 лет назад |
Georgi Gerganov
|
880f57973b
llama : fix integer overflow during quantization (#4284)
|
2 лет назад |
Daniel Bevenius
|
8d6d9f033b
py : add requirements file for convert-hf-to-gguf.py (#4277)
|
2 лет назад |
Georgi Gerganov
|
ef47ec18da
ggml : add ggml_soft_max_ext (#4256)
|
2 лет назад |
Ziad Ben Hadj-Alouane
|
1d144112c0
server : add --log-disable to disable logging to file (#4260)
|
2 лет назад |
Ziad Ben Hadj-Alouane
|
f43f09366d
server : add single-client multi-prompt support (#4232)
|
2 лет назад |
WillCorticesAI
|
d2809a3ba2
make : fix Apple clang determination bug (#4272)
|
2 лет назад |