Georgi Gerganov
|
e128a1bf5b
tests : fix test-quantize-fns to init the CPU backend (#12306)
|
10 months ago |
marcoStocchi
|
6ef79a67ca
common : refactor '-o' option (#12278)
|
10 months ago |
Olivier Chafik
|
4e39a3c332
`server`: extract <think> tags from qwq outputs (#12297)
|
10 months ago |
Olivier Chafik
|
be421fc429
`tool-call`: ensure there's always a non-empty tool call id (#12292)
|
10 months ago |
Olivier Chafik
|
87c2630546
allow missing content in message if tool_calls provided (#12293)
|
10 months ago |
Olivier Chafik
|
2b3a25c212
`sampler`: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291)
|
10 months ago |
tc-mb
|
8352cdc87b
llava : fix bug in minicpm-v code (#11513)
|
10 months ago |
Georgi Gerganov
|
1e2f78a004
server : add speculative decoding presets for FIM (#12287)
|
10 months ago |
Georgi Gerganov
|
0fd7ca7a21
authors : update (#12271)
|
10 months ago |
Jason C.H
|
6fefc05a7a
ggml-backend : make path_str compatible with C++20 (#12269)
|
10 months ago |
Georgi Gerganov
|
7ab364390f
server : infill gen ends on new line (#12254)
|
10 months ago |
Daniel Bevenius
|
7c7f3b7f43
ggml : skip intermediate .air file when compiling .metallib (#12247)
|
10 months ago |
Georgi Gerganov
|
102ac1891d
sync : ggml
|
10 months ago |
vmobilis
|
d6ae2fa061
ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118)
|
10 months ago |
Rémy O
|
68d0027f3d
ggml-cpu: faster AVX2 variant for IQ1_M (#12216)
|
10 months ago |
Georgi Gerganov
|
ea002810a2
ci : fix save-load test invocations (#12245)
|
10 months ago |
Sigbjørn Skjæret
|
8fad3c7a7c
server : Log original chat template parsing error (#12233)
|
10 months ago |
Olivier Chafik
|
7cf64f6bee
sync: minja - support QwQ-32B (#12235)
|
10 months ago |
BB-fat
|
5e2d57b2b2
metal : simplify kernel arguments using a struct (#3229) (#12194)
|
10 months ago |
David Huang
|
f1648e91cf
HIP: fix rocWMMA build flags under Windows (#12230)
|
10 months ago |
Daniel Bevenius
|
d6c95b0740
metal : fix default.metallib build (#12224)
|
10 months ago |
lhez
|
d76a86d967
opencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (#12217)
|
10 months ago |
xiaofei
|
776f9e59cc
cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (#12094)
|
10 months ago |
Lucas Moura Belo
|
3d652bfddf
readme : update bindings (#12229)
|
10 months ago |
Johannes Gäßler
|
5220a16d18
CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (#12222)
|
10 months ago |
David Huang
|
3ffbbd5ce1
HIP: rocWMMA documentation and enabling in workflow builds (#12179)
|
10 months ago |
Olivier Chafik
|
42994048a3
update function-calling.md w/ template override for functionary-small-v3.2 (#12214)
|
10 months ago |
Aaron Teo
|
e9b2f84f14
llava: add big-endian conversion for image encoder (#12218)
|
10 months ago |
uvos
|
e721c05c93
HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (#12209)
|
10 months ago |
Han Yin
|
57b6abf85a
android : fix KV cache log message condition (#12212)
|
10 months ago |