Xie Yanbo
|
3246fe84d7
Fix minicpm example directory (#9111)
|
1 سال پیش |
compilade
|
78eb487bb0
llama : fix qs.n_attention_wv for DeepSeek-V2 (#9156)
|
1 سال پیش |
Xuan Son Nguyen
|
a77feb5d71
server : add some missing env variables (#9116)
|
1 سال پیش |
CausalLM
|
2e59d61c1b
llama : fix ChatGLM4 wrong shape (#9194)
|
1 سال پیش |
Carsten Kragelund Jørgensen
|
75e1dbbaab
llama : fix llama3.1 rope_freqs not respecting custom head_dim (#9141)
|
1 سال پیش |
arch-btw
|
ad76569f8e
common : Update stb_image.h to latest version (#9161)
|
1 سال پیش |
slaren
|
7d787ed96c
ggml : do not crash when quantizing q4_x_x with an imatrix (#9192)
|
1 سال پیش |
Georgi Gerganov
|
06658ad7c3
metal : separate scale and mask from QKT in FA kernel (#9189)
|
1 سال پیش |
Georgi Gerganov
|
fc18425b6a
ggml : add SSM Metal kernels (#8546)
|
1 سال پیش |
Georgi Gerganov
|
879275ac98
tests : fix compile warnings for unreachable code (#9185)
|
1 سال پیش |
Georgi Gerganov
|
7a3df798fc
ci : add VULKAN support to ggml-ci (#9055)
|
1 سال پیش |
Georgi Gerganov
|
e5edb210cd
server : update deps (#9183)
|
1 سال پیش |
slaren
|
0c41e03ceb
metal : gemma2 flash attention support (#9159)
|
1 سال پیش |
slaren
|
f12ceaca0c
ggml-ci : try to improve build time (#9160)
|
1 سال پیش |
Justine Tunney
|
436787f170
llama : fix time complexity of string replacement (#9163)
|
1 سال پیش |
Herman Semenov
|
93bc3839f9
common: fixed not working find argument --n-gpu-layers-draft (#9175)
|
1 سال پیش |
Johannes Gäßler
|
f91fc5639b
CUDA: fix Gemma 2 numerical issues for FA (#9166)
|
1 سال پیش |
Johannes Gäßler
|
e11bd856d5
CPU/CUDA: Gemma 2 FlashAttention support (#8542)
|
1 سال پیش |
João Dinis Ferreira
|
8f824ffe8e
quantize : fix typo in usage help of `quantize.cpp` (#9145)
|
1 سال پیش |
Xuan Son Nguyen
|
3ba780e2a8
lora : fix llama conversion script with ROPE_FREQS (#9117)
|
1 سال پیش |
piDack
|
a07c32ea54
llama : use F32 precision in GLM4 attention and no FA (#9130)
|
1 سال پیش |
Akarshan Biswas
|
11b84eb457
[SYCL] Add a space to supress a cmake warning (#9133)
|
1 سال پیش |
luoyu-intel
|
1731d4238f
[SYCL] Add oneDNN primitive support (#9091)
|
1 سال پیش |
compilade
|
a1631e53f6
llama : simplify Mamba with advanced batch splits (#8526)
|
1 سال پیش |
Xuan Son Nguyen
|
fc54ef0d1c
server : support reading arguments from environment variables (#9105)
|
1 سال پیش |
Younes Belkada
|
b40eb84895
llama : support for `falcon-mamba` architecture (#9074)
|
1 سال پیش |
fairydreaming
|
f63f603c87
llava : zero-initialize clip_ctx structure fields with aggregate initialization 908)
|
1 سال پیش |
Daniel Bevenius
|
8455340b87
llama : std::move llm_bigram_bpe from work_queue (#9062)
|
1 سال پیش |
Changyeon Kim
|
2f3c1466ff
llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. (#8984)
|
1 سال پیش |
Meng, Hengyu
|
50addec9a5
[SYCL] fallback mmvq (#9088)
|
1 سال پیش |