slaren
|
67be2ce101
cuda : fix data race in soft max (#5853)
|
1 year ago |
Georgi Gerganov
|
231ae28f07
readme : add API changes section
|
1 year ago |
Douglas Hanley
|
475df1d6cf
llama : allow for user specified embedding pooling type (#5849)
|
1 year ago |
Nindaleth
|
87c2e8b279
gguf-dump : support i-quants (#5841)
|
1 year ago |
compilade
|
de9692a7d2
llama : fix llama_copy_state_data with fragmented KV cache (#5840)
|
1 year ago |
Pierrick Hymbert
|
e6029348e8
ci : schedule slow server tests only on Release or on demand (#5839)
|
1 year ago |
Pierrick Hymbert
|
8ef969afce
server : init http requests thread pool with --parallel if set (#5836)
|
1 year ago |
Georgi Gerganov
|
fa974646e1
flake.lock: Update (#5842)
|
1 year ago |
Pierrick Hymbert
|
9731134296
server: tests: passkey challenge / self-extend with context shift demo (#5832)
|
1 year ago |
Michael Podvitskiy
|
4a6e2d6142
llama : add abort_callback to interrupt computation (#5409)
|
1 year ago |
Georgi Gerganov
|
494c870326
ggml : fix IQ3_S AVX implementation (#5834)
|
1 year ago |
Jared Van Bortel
|
4d4d2366fc
convert : automatically fall back to HfVocab if tokenizer.model doesn't exist (#5821)
|
1 year ago |
Jared Van Bortel
|
c7a0ad8ec9
convert-hf : make model class definitions self-contained (#5825)
|
1 year ago |
Kawrakow
|
bbde6eb256
ggml : IQ3_S improvements (#5829)
|
1 year ago |
Georgi Gerganov
|
ef2cd694c4
scripts : add pod-llama.sh
|
1 year ago |
Xuan Son Nguyen
|
6c32d8c7ad
llama : refactor internal quantization functions (#5830)
|
1 year ago |
compilade
|
802da0091b
llama : fix segfault from unknown model arch name (#5820)
|
1 year ago |
Neo Zhang Jianyu
|
715641391d
Support multiple GPUs (split mode) on SYCL backend (#5806)
|
1 year ago |
crasm
|
9bf297a02b
workflows : remove nocleanup arg for check-requirements.sh (#5826)
|
1 year ago |
Tushar
|
cb5e8f7fc4
build(nix): Introduce flake.formatter for `nix fmt` (#5687)
|
1 year ago |
nold
|
da3b9ba2b7
convert-hf-to-gguf : require einops for InternLM2ForCausalLM (#5792)
|
1 year ago |
Sourab Mangrulkar
|
c29af7e225
llama : add StarCoder2 support (#5795)
|
1 year ago |
Georgi Gerganov
|
38d16b1426
server : remove api_like_OAI.py proxy script (#5808)
|
1 year ago |
ddpasa
|
c2224f003b
ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (#5813)
|
1 year ago |
kunal-vaishnavi
|
e743386728
gemma : fix bfloat16 -> float16 conversion issue (#5810)
|
1 year ago |
Miwa / Ensan
|
f49a535686
common : fix flag `--logits-all` to `--all-logits` (#5805)
|
1 year ago |
Pierrick Hymbert
|
3ab8b3a92e
llama : cleanup unused mmq flags (#5772)
|
1 year ago |
Douglas Hanley
|
9600d59e01
unicode : switch to multimap based nfd_map (#5799)
|
1 year ago |
Pierrick Hymbert
|
5cb02b4a01
server: allow to override threads server pool with --threads-http (#5794)
|
1 year ago |
Eve
|
6ea0f010ff
ci : add Ubuntu 22 Vulkan CI run (#5789)
|
1 year ago |