agray3
|
13dca2a54a
Vectorize load instructions in dmmv f16 CUDA kernel (#9816)
|
1 rok temu |
Georgi Gerganov
|
d4c19c0f5c
server : accept extra_context for the infill endpoint (#9874)
|
1 rok temu |
Georgi Gerganov
|
c7181bd294
server : reuse cached context chunks (#9866)
|
1 rok temu |
Georgi Gerganov
|
92be9f1216
flake.lock: Update (#9870)
|
1 rok temu |
Georgi Gerganov
|
edc265661c
server : add option to time limit the generation phase (#9865)
|
1 rok temu |
Georgi Gerganov
|
1bde94dd02
server : remove self-extend features (#9860)
|
1 rok temu |
Georgi Gerganov
|
95c76e8e92
server : remove legacy system_prompt feature (#9857)
|
1 rok temu |
Georgi Gerganov
|
11ac9800af
llama : improve infill support and special token detection (#9798)
|
1 rok temu |
R0CKSTAR
|
943d20b411
musa : update doc (#9856)
|
1 rok temu |
Diego Devesa
|
96776405a1
ggml : move more prints to the ggml log system (#9839)
|
1 rok temu |
Diego Devesa
|
7eee341bee
common : use common_ prefix for common library functions (#9805)
|
1 rok temu |
Diego Devesa
|
0e9f760eb1
rpc : add backend registry / device interfaces (#9812)
|
1 rok temu |
R0CKSTAR
|
cf8e0a3bb9
musa: add docker image support (#9685)
|
1 rok temu |
Diego Devesa
|
c7499c557c
examples : do not use common library in simple example (#9803)
|
1 rok temu |
Diego Devesa
|
c81f3bbb05
cmake : do not build common library by default when standalone (#9804)
|
1 rok temu |
Georgi Gerganov
|
e7022064ab
perplexity : fix integer overflow (#9783)
|
1 rok temu |
Georgi Gerganov
|
3dc48fe75a
examples : remove llama.vim
|
1 rok temu |
Diego Devesa
|
dca1d4b58a
ggml : fix BLAS with unsupported types (#9775)
|
1 rok temu |
Xuan Son Nguyen
|
458367a906
server : better security control for public deployments (#9776)
|
1 rok temu |
standby24x7
|
fa42aa6d89
scripts : fix spelling typo in messages and comments (#9782)
|
1 rok temu |
Diego Devesa
|
6374743747
ggml : add backend registry / device interfaces to BLAS backend (#9752)
|
1 rok temu |
Andrew Minh Nguyen
|
f1af42fa8c
Update building for Android (#9672)
|
1 rok temu |
Georgi Gerganov
|
6279dac039
flake.lock: Update (#9753)
|
1 rok temu |
Georgi Gerganov
|
d5ac8cf2f2
ggml : add metal backend registry / device (#9713)
|
1 rok temu |
Paul Tsochantaris
|
96b6912103
metal : single allocation of encode_async block (#9747)
|
1 rok temu |
Georgi Gerganov
|
d5cb86844f
contrib : simplify + minor edits [no ci]
|
1 rok temu |
Georgi Gerganov
|
f4b2dcdf49
readme : fix typo [no ci]
|
1 rok temu |
Georgi Gerganov
|
b6d6c5289f
sync : llama.cpp
|
1 rok temu |
SRHMorris
|
b0915d5b51
vulkan : retry allocation with fallback flags (whisper/2451)
|
1 rok temu |
Georgi Gerganov
|
8c475b97b8
rerank : use [SEP] token instead of [BOS] (#9737)
|
1 rok temu |