uvos
|
396856b400
CUDA/HIP: add support for selectable warp size to mmv (#11519)
|
11 months ago |
uvos
|
4d0598e144
HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (#11601)
|
11 months ago |
Olivier Chafik
|
90f9b88afb
nit: more informative crash when grammar sampler fails (#11593)
|
11 months ago |
Johannes Gäßler
|
864a0b67a6
CUDA: use mma PTX instructions for FlashAttention (#11583)
|
11 months ago |
Eric Curtin
|
84ec8a58f7
Name colors (#11573)
|
11 months ago |
Olivier Chafik
|
bfcce4d693
`tool-call`: support Command R7B (+ return tool_plan "thoughts" in API) (#11585)
|
11 months ago |
Olivier Chafik
|
69804487e0
Fix exotic ci env that lacks ostringstream::str (#11581)
|
11 months ago |
Michał Moskal
|
ff227703d6
sampling : support for llguidance grammars (#10224)
|
11 months ago |
piDack
|
0cec062a63
llama : add support for GLM-Edge and GLM-Edge-V series models (#10573)
|
11 months ago |
Olivier Chafik
|
53debe6f3c
ci: use sccache on windows HIP jobs (#11553)
|
11 months ago |
Olivier Chafik
|
cfd74c86db
`sync`: minja (https://github.com/google/minja/commit/418a2364b56dc9be4ed9a1a2b0fb16fb53a7a22e) (#11574)
|
11 months ago |
Eric Curtin
|
ecef206ccb
Implement s3:// protocol (#11511)
|
11 months ago |
Olivier Chafik
|
5bbc7362cb
ci: simplify cmake build commands (#11548)
|
11 months ago |
Olivier Chafik
|
aa6fb13213
`ci`: use sccache on windows instead of ccache (#11545)
|
11 months ago |
Olivier Chafik
|
a83f528688
`tool-call`: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539)
|
11 months ago |
Olivier Chafik
|
b1bcd309fc
fix stop regression (#11543)
|
11 months ago |
Olivier Chafik
|
5783575c9d
Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533)
|
11 months ago |
Olivier Chafik
|
4a2b196d03
server : fix --jinja when there's no tools or schema (typo was forcing JSON) (#11531)
|
11 months ago |
Steve Grubb
|
1bd3047a93
common: Add missing va_end (#11529)
|
11 months ago |
Daniel Bevenius
|
a2df2787b3
server : update help metrics processing/deferred (#11512)
|
11 months ago |
Olivier Chafik
|
553f1e46e9
`ci`: ccache for all github worfklows (#11516)
|
11 months ago |
Olivier Chafik
|
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639)
|
11 months ago |
uvos
|
27d135c970
HIP: require at least HIP 5.5
|
11 months ago |
uvos
|
6af1ca48cb
HIP: Prepare reduction operators for wave 64
|
11 months ago |
uvos
|
c300e68ef4
CUDA/HIP: add warp_size to cuda_device_info
|
11 months ago |
Olivier Chafik
|
3d804dec76
sync: minja (#11499)
|
11 months ago |
mgroeber9110
|
ffd0821c57
vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496)
|
11 months ago |
Daniel Bevenius
|
4314e56c4f
server : use lambda instead of std::bind (#11507)
|
11 months ago |
Isaac McFadyen
|
496e5bf46b
server : (docs) added response format for /apply-template [no ci] (#11503)
|
11 months ago |
Guspan Tanadi
|
7919256c57
readme : reference examples relative links (#11505)
|
11 months ago |