Reese Levine
|
03ea04175d
ggml webgpu: minor set rows optimization (#16810)
|
2 months ago |
Georgi Gerganov
|
cdabeb2c27
sync : ggml
|
2 months ago |
Georgi Gerganov
|
852ce5180a
ggml : fix conv2d_dw SVE path (ggml/1380)
|
2 months ago |
mnehete32
|
9aa63374f2
CUDA: update ops.md (#17005)
|
2 months ago |
lhez
|
5e90233bdb
opencl: update doc (#17011)
|
2 months ago |
nullname
|
a5c07dcd7b
refactor: replace sprintf with snprintf for safer string handling in dump functions (#16913)
|
2 months ago |
Jeff Bolz
|
ad51c0a720
vulkan: remove the need for the dryrun (#16826)
|
2 months ago |
Georgi Gerganov
|
66d8eccd42
server : do context shift only while generating (#17000)
|
2 months ago |
Georgi Gerganov
|
afd353246d
readme : update hot topics (#17002)
|
2 months ago |
Acly
|
cc98f8d349
ggml-cpu : bicubic interpolation (#16891)
|
2 months ago |
Sigbjørn Skjæret
|
d945834366
ci : apply model label to models (#16994)
|
2 months ago |
Sigbjørn Skjæret
|
b164259bba
chore : fix models indent after refactor (#16992)
|
2 months ago |
Noah
|
1f5accb8d0
Fix garbled output with REPACK at high thread counts (#16956)
|
2 months ago |
Aman Gupta
|
2759ccdb4a
CUDA: avoid mul + bias fusion when doing fusion (#16935)
|
2 months ago |
lhez
|
c5023daf60
opencl: support imrope (#16914)
|
2 months ago |
Aleksander Grygier
|
e7da30b584
fix: Viewing multiple PDF attachments (#16974)
|
2 months ago |
Daniel Bevenius
|
ed8aa63320
model-conversion : pass config to from_pretrained (#16963)
|
2 months ago |
Georgi Gerganov
|
48bd26501b
server : add props.model_alias (#16943)
|
2 months ago |
theo77186
|
622cd010ff
ggml: CUDA: add head size 72 for flash-attn (#16962)
|
2 months ago |
Xuan-Son Nguyen
|
070ff4d535
mtmd: add --image-min/max-tokens (#16921)
|
2 months ago |
Xuan-Son Nguyen
|
bf7b0c9725
mtmd: pad mask for qwen2.5vl (#16954)
|
2 months ago |
Jinyang He
|
fcfce040e8
ggml : LoongArch fixes (#16958)
|
2 months ago |
Olivier Chafik
|
ee3a5a10ad
sync: minja (glm 4.6 & minmax m2 templates) (#16949)
|
2 months ago |
shani-f
|
7e994168b1
SYCL: optimized repeat_back kernel (3× fewer asm instructions, 2× faster)Feature/sycl repeat back opt (#16869)
|
2 months ago |
Sascha Rogmann
|
bcfa87622a
feat(webui): improve LaTeX rendering with currency detection (#16508)
|
2 months ago |
Shagun Bera
|
a2054e3a8f
test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (#16936)
|
2 months ago |
Sigbjørn Skjæret
|
dd52868050
ci : disable failing riscv cross build (#16952)
|
2 months ago |
Zhiyong Wang
|
6b9a52422b
model: add Janus Pro for image understanding (#16906)
|
2 months ago |
Georgi Gerganov
|
2f966b8ed8
clip : use FA (#16837)
|
2 months ago |
Georgi Gerganov
|
cd5e3b5754
server : support unified cache across slots (#16736)
|
2 months ago |