Sigbjørn Skjæret
|
b25346221d
llama : return mistral-v7-tekken as default template only (#14390)
|
6 months ago |
Georgi Gerganov
|
e8215dbb96
metal : add special-case mat-vec mul for ne00 == 4 (#14385)
|
6 months ago |
Georgi Gerganov
|
5783ae4359
metal : batch rows copy in a single threadgroup (#14384)
|
6 months ago |
Aaron Teo
|
bf5bcd0b85
docs: update s390x documentation + add faq (#14389)
|
6 months ago |
R0CKSTAR
|
716301d1b0
musa: enable fp16 mma (all) and cublas on qy2 (#13842)
|
6 months ago |
Aaron Teo
|
60ef23d6c1
ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317)
|
6 months ago |
Sigbjørn Skjæret
|
b193d53069
ggml : do not output unprintable characters on GGUF load failure (#14381)
|
6 months ago |
Anton Mitkov
|
2bf9d539dd
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (#13973)
|
6 months ago |
lhez
|
73e53dc834
opencl: ref count `ggml_backend_opencl_context` and refactor profiling (#14254)
|
7 months ago |
Georgi Gerganov
|
62af464227
batch : fix check for empty sequences in memory (#14364)
|
7 months ago |
Mathieu Baudier
|
c148cf1946
cmake : use LLAMA_BUILD_NUMBER when defining LLAMA_INSTALL_VERSION (#14362)
|
7 months ago |
Nigel Bosch
|
1b809cee22
server : move no API key doc to /health (#14352)
|
7 months ago |
Sigbjørn Skjæret
|
abf241045d
main : honor --verbose-prompt on interactive prompts (#14350)
|
7 months ago |
Bartowski
|
901e20bbe5
jinja : Add Mistral-Small-3.2-24B-Instruct-2506.jinja (#14349)
|
7 months ago |
uvos
|
0142961a2e
CUDA/HIP: optimize mmv paths taken for HIP devices (#14324)
|
7 months ago |
bandoti
|
ce82bd0117
ci: add workflow for relocatable cmake package (#14346)
|
7 months ago |
Jeff Bolz
|
bf2a99e3cb
vulkan: update windows SDK in release.yml (#14344)
|
7 months ago |
Molly Sophia
|
72c6bc3f3d
llama : better rwkv chat template and add missing `inputs.use_jinja` setting (#14336)
|
7 months ago |
Johannes Gäßler
|
defe2158dd
CUDA: mul_mat_v support for batch sizes > 1 (#14262)
|
7 months ago |
Georgi Gerganov
|
7b50d589a8
kv-cells : fix tracking of seq_pos (#14339)
|
7 months ago |
Jeff Bolz
|
3a9457df96
vulkan: update windows SDK in CI (#14334)
|
7 months ago |
Ed Addario
|
fa4a9f2a1c
quantize : handle user-defined pruning of whole layers (blocks) (#13037)
|
7 months ago |
Sigbjørn Skjæret
|
238005c2dc
gguf-py : fix SpecialVocab parsing when post_processor is null (#14330)
|
7 months ago |
Ruikai Peng
|
66aba7aca9
run : avoid double tokenization (#14327)
|
7 months ago |
Georgi Gerganov
|
f1f5e82df6
examples : fix is_first logic for tokenization (#14329)
|
7 months ago |
uvos
|
af3373f1ad
HIP: enable vec fattn on RDNA4 (#14323)
|
7 months ago |
yuiseki
|
5d5c066de8
mtmd : fix Pixtral OOM with large images by capping image_size to 1024 (#14326)
|
7 months ago |
Sigbjørn Skjæret
|
40bfa04c95
common : use std::string_view now that we target c++17 (#14319)
|
7 months ago |
Aman Gupta
|
aa064b2eb7
CUDA: add mean operation (#14313)
|
7 months ago |
Sigbjørn Skjæret
|
aa0ef5c578
gguf-py : fix Qwen3-Embedding eos token (#14314)
|
7 months ago |