Neuman Vong
|
4b7b38bef5
vulkan: Set limit for task concurrency (#5427)
|
1 rok temu |
Daniel Bevenius
|
e00d2a62dd
llava : add requirements.txt and update README.md (#5428)
|
1 rok temu |
Riley Stewart
|
7c777fcd5d
server : fix prompt caching for repeated prompts (#5420)
|
1 rok temu |
Paul Tsochantaris
|
e5ca3937c6
llama : do not cap thread count when MoE on CPU (#5419)
|
1 rok temu |
Marko Tasic
|
e4124c2477
readme : add JavaScript/Wasm repo (#5415)
|
1 rok temu |
Michael Podvitskiy
|
b2f87cb64d
ggml : fix `error C2078: too many initializers` for MSVC ARM64 (#5404)
|
1 rok temu |
0cc4m
|
44fbe34360
Fix Vulkan crash on APUs with very little device memory (#5424)
|
1 rok temu |
Johannes Gäßler
|
8e6a9d2de0
CUDA: more warps for mmvq on NVIDIA (#5394)
|
1 rok temu |
slaren
|
41f308f58e
llama : do not print "offloading layers" message in CPU-only builds (#5416)
|
1 rok temu |
Abhilash Majumder
|
6e99f2a04f
Fix f16_sycl cpy call from Arc (#5411)
|
1 rok temu |
Daniel Bevenius
|
ff4ff05c5f
llava : add missing .py, and fix paths in README.md (#5414)
|
1 rok temu |
Johannes Gäßler
|
b7b74cef36
fix trailing whitespace (#5407)
|
1 rok temu |
runfuture
|
4aa43fab56
llama : fix MiniCPM (#5392)
|
1 rok temu |
Daniel Bevenius
|
a6e514a85f
llava: fix typo/formatting in README.md (#5405)
|
1 rok temu |
Johannes Gäßler
|
26d4efd11e
sampling: fix top_k <= 0 (#5388)
|
1 rok temu |
Georgi Gerganov
|
8504d2d0da
tests : .gitignore obj files
|
1 rok temu |
Michael Podvitskiy
|
c4fbb6717c
CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393)
|
1 rok temu |
Ebey Abraham
|
8c933b70c2
fix typo in readme (#5399)
|
1 rok temu |
Kamil Tomšík
|
b906596bb7
Add Ava in the list of llama.cpp UIs (#4362)
|
1 rok temu |
Johannes Gäßler
|
aa7ab99be2
CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386)
|
1 rok temu |
Neo Zhang Jianyu
|
10afa6f1d1
[SYCL] update install make by w64devkit (#5297)
|
1 rok temu |
Xiao-Yong Jin
|
0ef46da632
llava-cli : always tokenize special tokens (#5382)
|
1 rok temu |
0cc4m
|
ee1628bdfe
Basic Vulkan Multi-GPU implementation (#5321)
|
1 rok temu |
Eve
|
ed0bf32290
readme : modernize (#5379)
|
1 rok temu |
Ben Williams
|
9a697d842b
readme : update ui list (#5354)
|
1 rok temu |
runfuture
|
316c7faf77
llama : add MiniCPM support (#5346)
|
1 rok temu |
Justin Parker
|
f3e2b4fa3f
server : update `/props` with "total_slots" value (#5373)
|
1 rok temu |
Sang-Kil Park
|
f68664ac24
convert : fix TypeError on GPT-2 vocab.json (#5288)
|
1 rok temu |
Alexey Parfenov
|
213d1439fa
server : remove model.json endpoint (#5371)
|
1 rok temu |
Johannes Gäßler
|
17c97fb062
CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370)
|
1 rok temu |