R0CKSTAR
|
fac63a3d78
musa: refine compute capability (#12493)
|
10 месяцев назад |
Jeff Bolz
|
eddfb43850
vulkan: Optimize mul_mat_vec p021 and nc shaders (#12505)
|
10 месяцев назад |
stduhpf
|
4375415b4a
Vulkan: RTE rounding for cpy to quant (#12480)
|
10 месяцев назад |
Eve
|
30c42ef5cb
vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (#12472)
|
10 месяцев назад |
Georgi Gerganov
|
af04481e6b
model : do not repack if a GPU device is present (#12498)
|
10 месяцев назад |
Sigbjørn Skjæret
|
960e726077
chore : cleanup llama_model_loader::TENSOR_ usage (#12492)
|
10 месяцев назад |
marcoStocchi
|
ea1518e839
llama-tts : avoid crashes related to bad model file paths (#12482)
|
10 месяцев назад |
蕭澧邦
|
1aa87ee53d
[SYCL] Fix build on Windows when ccache enabled (#9954) (#9976)
|
10 месяцев назад |
Svetlozar Georgiev
|
9ffcc9e374
sycl: cleanup oneDNN related code (#12097)
|
10 месяцев назад |
Woof Dog
|
e04643063b
webui : Prevent rerendering on textarea input (#12299)
|
10 месяцев назад |
Sigbjørn Skjæret
|
dbb3a4739e
llama : make Qwen2MoE QKV bias optional (#12477)
|
10 месяцев назад |
Srihari-mcw
|
3d82dbcbce
ggml : block interleaving support for Q4_K quantization for x86 AVX2 architecture (#12332)
|
10 месяцев назад |
Bartowski
|
732b5fbf5e
convert : avoid calls to tokenizer.added_tokens_decoder (#12473)
|
10 месяцев назад |
fairydreaming
|
568013d0cd
context : clear sets containing encoder output sequence ids before storing new values (#12470)
|
10 месяцев назад |
Gaurav Garg
|
517b5ddbf0
CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (#12183)
|
10 месяцев назад |
Jeff Bolz
|
a9b59288e2
vulkan: optimize iq1 coopmat2 dequant functions (#12427)
|
10 месяцев назад |
Guus Waals
|
0fd8487b14
Fix visionOS build and add CI (#12415)
|
10 месяцев назад |
Sigbjørn Skjæret
|
108e53c2f1
llama : add support for GPT2, Bloom and CodeShell tied word embeddings (#12456)
|
10 месяцев назад |
Sigbjørn Skjæret
|
a686171ea7
convert : Support chat_template.json (#12460)
|
10 месяцев назад |
Jeff Bolz
|
c446b2edd2
vulkan: Submit once enough matmul work has been recorded (#12406)
|
10 месяцев назад |
lhez
|
d84635b1b0
opencl: improve profiling (#12442)
|
10 месяцев назад |
Georgi Gerganov
|
75422e8bc4
graph : normalize Q, K, V shapes + sync cross attention (#12449)
|
10 месяцев назад |
R0CKSTAR
|
bb115d2bf7
musa: override warp_size of musa device to 32 (#12445)
|
10 месяцев назад |
Xuan-Son Nguyen
|
29fff308c7
llama : support converting Mistral Small text-only (#12450)
|
10 месяцев назад |
Georgi Gerganov
|
c6af2161b2
speculative : fix seg fault in certain cases (#12454)
|
10 месяцев назад |
Xuan-Son Nguyen
|
99aa304fb9
llama : add support for EXAONE tied word embeddings (#12451)
|
10 месяцев назад |
Georgi Gerganov
|
8551c44d84
context : always use non-causal attention for encoder graphs (#12447)
|
10 месяцев назад |
Łukasz Ślusarczyk
|
35cae5ba05
SYCL: using graphs is configurable by environment variable and compile option (#12371)
|
10 месяцев назад |
Georgi Gerganov
|
810e0af3f5
server : fix warmup draft cache type (#12446)
|
10 месяцев назад |
Prajwal B Mehendarkar
|
eba92d64c3
cmake : fix PowerPC build (#12241)
|
10 месяцев назад |