Johannes Gäßler
|
7dc78764e2
compare-llama-bench: tweak output format (#4910)
|
2 лет назад |
Ziad Ben Hadj-Alouane
|
356327feb3
server : fix deadlock that occurs in multi-prompt scenarios (#4905)
|
2 лет назад |
makomk
|
ee8243adaa
server : fix crash with multimodal models without BOS token (#4904)
|
2 лет назад |
Georgi Gerganov
|
15ebe59210
convert : update phi-2 to latest HF repo (#4903)
|
2 лет назад |
Georgi Gerganov
|
de473f5f8e
sync : ggml
|
2 лет назад |
Georgi Gerganov
|
f238461236
ggml : fix 32-bit ARM compat for IQ2_XS (whisper/1758)
|
2 лет назад |
slaren
|
fa5c1fb44a
backend_sched : fix assignments
|
2 лет назад |
Maximilian Winter
|
52ee4540c0
examples : add pydantic models to GBNF grammar generator (#4883)
|
2 лет назад |
Johannes Gäßler
|
3fe81781e3
CUDA: faster q8_0 -> f16 dequantization (#4895)
|
2 лет назад |
slaren
|
e7e4df031b
llama : ggml-backend integration (#4766)
|
2 лет назад |
Georgi Gerganov
|
584d674be6
llama : remove redundant assert for StableLM (#4901)
|
2 лет назад |
Daniel Bevenius
|
930f907d3e
export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894)
|
2 лет назад |
Zay
|
e790eef21c
llama.swiftui : update models layout (#4826)
|
2 лет назад |
Georgi Gerganov
|
5537d9d36b
gitignore : imatrix
|
2 лет назад |
Johannes Gäßler
|
1b280c9fff
CUDA: fix softmax compile for old CUDA versions (#4862)
|
2 лет назад |
Georgi Gerganov
|
3cabe80630
llama : fix typo "imp_embd" -> "inp_embd"
|
2 лет назад |
howlger
|
4315a94366
common : streamline the formatting of help (#4890)
|
2 лет назад |
Georgi Gerganov
|
2d00741e12
py : fix lint (#4889)
|
2 лет назад |
Georgi Gerganov
|
f445c0e68c
llama : fix llm_build_k_shift to use correct n_rot (#4889)
|
2 лет назад |
Kawrakow
|
326b418b59
Importance Matrix calculation (#4861)
|
2 лет назад |
Georgi Gerganov
|
1d118386fe
server : fix infill when prompt is empty (#4833)
|
2 лет назад |
Georgi Gerganov
|
7edefbd79c
main : better name for variable n_print (#4874)
|
2 лет назад |
Georgi Gerganov
|
3ca63b4538
main : disable token count by default (#4874)
|
2 лет назад |
Georgi Gerganov
|
b037787548
swift : track ggml release branch (#4867)
|
2 лет назад |
Kawrakow
|
469e75d0a3
llama : restore intended k-quants mixes for MoE models (#4872)
|
2 лет назад |
Kawrakow
|
49662cbed3
ggml : SOTA 2-bit quants (add IQ2_XS) (#4856)
|
2 лет назад |
Georgi Gerganov
|
3ba5b8ca8e
swift : pin ggml commit + remove ggml.h from spm-headers (#4878)
|
2 лет назад |
Laura
|
4330bd83fe
server : implement credentialed CORS (#4514)
|
2 лет назад |
Michael Coppola
|
27379455c3
server : support for multiple api keys (#4864)
|
2 лет назад |
Behnam M
|
eab6795006
server : add `LOG_INFO` when model is successfully loaded (#4881)
|
2 лет назад |