Xuan-Son Nguyen
|
ee01d71e58
server : (webui) fix a very small misalignment (#13387)
|
8 months ago |
Xuan-Son Nguyen
|
8c83449cb7
server : (webui) revamp the input area, plus many small UI improvements (#13365)
|
8 months ago |
Sigbjørn Skjæret
|
1a844be132
convert : support rope_scaling type and rope_type (#13349)
|
8 months ago |
welix
|
0ccc121354
mtmd : fix the calculation of n_tokens for smolvlm (#13381)
|
8 months ago |
Georgi Gerganov
|
6562e5a4d6
context : allow cache-less context for embeddings (#13108)
|
8 months ago |
Georgi Gerganov
|
51fb96b1ff
context : remove logits_all flag (#13284)
|
8 months ago |
Diego Devesa
|
70a6991edf
ci : move release workflow to a separate file (#13362)
|
8 months ago |
Diego Devesa
|
f061021206
llama : print size and type of overridden tensors (#13364)
|
8 months ago |
Alberto Cabrera Pérez
|
8733e0cf6e
sycl: addressing non-contiguous src1 mul_mats (nc and batched) (#13343)
|
8 months ago |
Diego Devesa
|
814f795e06
docker : disable arm64 and intel images (#13356)
|
8 months ago |
Georgi Gerganov
|
d879433824
sync : ggml
|
8 months ago |
Daniel Bevenius
|
13b0a04597
whisper: remove MSVC warnings pragmas (whisper/3090)
|
8 months ago |
Jared Tweed
|
bba9d945c1
cmake : removed stdc++fs (whisper/3097)
|
8 months ago |
Sigbjørn Skjæret
|
bc4e1128f7
llama : deci : support ffn-free with attention (#13296)
|
8 months ago |
Ycros
|
39e73ae0d6
common : Add a warning when we can't match samplers from a string or char. (#13330)
|
8 months ago |
R0CKSTAR
|
1f73301b63
cuda : remove nrows_x in mul_mat_q_process_tile (#13325)
|
8 months ago |
Georgi Gerganov
|
4773d7a02f
examples : remove infill (#13283)
|
8 months ago |
piDack
|
6c7fd67b64
llama : support tie embedding for chatglm models (#13328)
|
8 months ago |
Johannes Gäßler
|
141a908a59
CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (#13135)
|
8 months ago |
Xuan-Son Nguyen
|
32916a4907
clip : refactor graph builder (#13321)
|
8 months ago |
DocShotgun
|
ffc727203a
sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345)
|
8 months ago |
oobabooga
|
91a86a6f35
sampling : don't consider -infinity values in top_n_sigma (#13344)
|
8 months ago |
Diego Devesa
|
f4ed10b69c
cmake : remove arm64 msvc presets (#13342)
|
8 months ago |
Akarshan Biswas
|
1e333d5bba
SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled (#13254)
|
8 months ago |
Xuan-Son Nguyen
|
2f54e348ad
llama : fix build_ffn without gate (#13336)
|
8 months ago |
Johannes Gäßler
|
2356fb1d53
CUDA: fix bad asserts for partial offload (#13337)
|
8 months ago |
Sigbjørn Skjæret
|
764b85627b
convert : qwen2/3moe : set yarn metadata if present (#13331)
|
8 months ago |
Johannes Gäßler
|
15a28ec8c7
CUDA: fix --split-mode row for MMQ (#13323)
|
8 months ago |
compilade
|
a7366faa5b
gguf-py : avoid requiring pyside6 for other scripts (#13036)
|
8 months ago |
Johannes Gäßler
|
9070365020
CUDA: fix logic for clearing padding with -ngl 0 (#13320)
|
8 months ago |