matteo
|
8cf6b42d46
server : send partial stop string when <EOG> is reached (#15007)
|
2 months ago |
Matthew Michel
|
9de9672adb
sycl: use async memory allocation to fix crashes during graph recording (#16644)
|
2 months ago |
Max Krasnyansky
|
63d2fc46e1
Add experimental ggml-hexagon backend for the Hexagon NPU (#16547)
|
2 months ago |
Diego Devesa
|
a2e0088d92
Revert "ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_v…" (#16723)
|
2 months ago |
Pascal
|
9b9201f65a
webui: introduce OpenAI-compatible model selector in JSON payload (#16562)
|
2 months ago |
sirus20x6
|
19a5a3edfd
ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_vec_set_f32 for faster fills (#16522)
|
2 months ago |
Acly
|
d8eaa26e4d
tests : fix test-thread-safety when compiling with multiple backends (#16699)
|
2 months ago |
Aman Gupta
|
9285325ce0
CUDA: fix bug in topk-moe softmax (#16711)
|
2 months ago |
Aman Gupta
|
03792ad936
CUDA: topk-moe: add optional parameter for gpt-oss (#16649)
|
2 months ago |
Johannes Gäßler
|
51d1a8c997
CUDA: better error for FA kernel with 0 occupancy (#16643)
|
2 months ago |
Aman Gupta
|
4926419c4d
ggml: add ggml_can_fuse_subgraph (#16662)
|
2 months ago |
lhez
|
6ea37f5739
opencl: fix warnings and clean up profiling (#16688)
|
2 months ago |
Jeff Bolz
|
fb349848f3
vulkan: Handle FA with all -inf mask values (#16447)
|
2 months ago |
YehuditE
|
6de8ed7519
sycl : add PAD_REFLECT_D1 operator support (#16145)
|
2 months ago |
Sigbjørn Skjæret
|
84bf3c6778
model : add BailingMoeV2 support (#16063)
|
2 months ago |
Aleksander Grygier
|
c9c1972e2c
Handle legacy 'context' attachments (#16687)
|
2 months ago |
Diego Devesa
|
b617cfd289
ggml-alloc : fix leak when reusing a tensor with a larger size (#16679)
|
2 months ago |
Aleksander Grygier
|
79068501fa
Prevent premature submission on IME input (#16673)
|
2 months ago |
Aleksander Grygier
|
0e4a0cf2fa
Import/Export UX improvements (#16619)
|
2 months ago |
Aleksander Grygier
|
13f2cfad41
Enable per-conversation loading states to allow having parallel conversations (#16327)
|
2 months ago |
takuya kodama
|
06332e2867
llama-batch: fix build fails with `-Werror=missing-braces` (#16614)
|
2 months ago |
Ron Evans
|
72d53e6c4d
readme: update bindings (#16651)
|
2 months ago |
safranowith
|
2330de7b84
SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (#16613)
|
2 months ago |
takuya kodama
|
7062dd8460
llama-context: only warn on pooling_type when user specified (#16674)
|
2 months ago |
Giuseppe Scrivano
|
0398752dd4
model : add Granite Hybrid types (#16635)
|
2 months ago |
Aaron Teo
|
4f73d0a951
ci : fix binaries release failure for s390x (binaries may not work yet) (#16664)
|
2 months ago |
Sigbjørn Skjæret
|
cec5edbcae
ci : avoid manual updates of docs/ops.md (#16663)
|
3 months ago |
Aaron Teo
|
fcb235b466
ci: include s390x release binaries (#16648)
|
3 months ago |
Aman Gupta
|
55754bebd5
CODEOWNERS: update for ggml-cuda/mmf (#16660)
|
3 months ago |
Johannes Gäßler
|
ee09828cb0
HIP: fix GPU_TARGETS (#16642)
|
3 months ago |