Xuan Son Nguyen
|
4ffcdce2ff
add alias for chat template (#5858)
|
hai 1 ano |
Georgi Gerganov
|
a0fc62661f
sync : ggml
|
hai 1 ano |
leejet
|
7d43c585dc
add some new ops, fix some operators and add batch operations to certain operators. (ggml/747)
|
hai 1 ano |
DAN™
|
82f3e668ad
common : use LLAMA_DEFAULT_SEED (#5855)
|
hai 1 ano |
DAN™
|
5a51cc1bb4
main : support special tokens as reverse/anti prompt (#5847)
|
hai 1 ano |
slaren
|
67be2ce101
cuda : fix data race in soft max (#5853)
|
hai 1 ano |
Georgi Gerganov
|
231ae28f07
readme : add API changes section
|
hai 1 ano |
Douglas Hanley
|
475df1d6cf
llama : allow for user specified embedding pooling type (#5849)
|
hai 1 ano |
Nindaleth
|
87c2e8b279
gguf-dump : support i-quants (#5841)
|
hai 1 ano |
compilade
|
de9692a7d2
llama : fix llama_copy_state_data with fragmented KV cache (#5840)
|
hai 1 ano |
Pierrick Hymbert
|
e6029348e8
ci : schedule slow server tests only on Release or on demand (#5839)
|
hai 1 ano |
Pierrick Hymbert
|
8ef969afce
server : init http requests thread pool with --parallel if set (#5836)
|
hai 1 ano |
Georgi Gerganov
|
fa974646e1
flake.lock: Update (#5842)
|
hai 1 ano |
Pierrick Hymbert
|
9731134296
server: tests: passkey challenge / self-extend with context shift demo (#5832)
|
hai 1 ano |
Michael Podvitskiy
|
4a6e2d6142
llama : add abort_callback to interrupt computation (#5409)
|
hai 1 ano |
Georgi Gerganov
|
494c870326
ggml : fix IQ3_S AVX implementation (#5834)
|
hai 1 ano |
Jared Van Bortel
|
4d4d2366fc
convert : automatically fall back to HfVocab if tokenizer.model doesn't exist (#5821)
|
hai 1 ano |
Jared Van Bortel
|
c7a0ad8ec9
convert-hf : make model class definitions self-contained (#5825)
|
hai 1 ano |
Kawrakow
|
bbde6eb256
ggml : IQ3_S improvements (#5829)
|
hai 1 ano |
Georgi Gerganov
|
ef2cd694c4
scripts : add pod-llama.sh
|
hai 1 ano |
Xuan Son Nguyen
|
6c32d8c7ad
llama : refactor internal quantization functions (#5830)
|
hai 1 ano |
compilade
|
802da0091b
llama : fix segfault from unknown model arch name (#5820)
|
hai 1 ano |
Neo Zhang Jianyu
|
715641391d
Support multiple GPUs (split mode) on SYCL backend (#5806)
|
hai 1 ano |
crasm
|
9bf297a02b
workflows : remove nocleanup arg for check-requirements.sh (#5826)
|
hai 1 ano |
Tushar
|
cb5e8f7fc4
build(nix): Introduce flake.formatter for `nix fmt` (#5687)
|
hai 1 ano |
nold
|
da3b9ba2b7
convert-hf-to-gguf : require einops for InternLM2ForCausalLM (#5792)
|
hai 1 ano |
Sourab Mangrulkar
|
c29af7e225
llama : add StarCoder2 support (#5795)
|
hai 1 ano |
Georgi Gerganov
|
38d16b1426
server : remove api_like_OAI.py proxy script (#5808)
|
hai 1 ano |
ddpasa
|
c2224f003b
ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (#5813)
|
hai 1 ano |
kunal-vaishnavi
|
e743386728
gemma : fix bfloat16 -> float16 conversion issue (#5810)
|
hai 1 ano |