o7si
|
3ce7a65c2f
server: fix: /metrics endpoint returning JSON-escaped Prometheus format (#17386)
|
1 mese fa |
Diego Devesa
|
e072b2052e
ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276)
|
1 mese fa |
R0CKSTAR
|
c6f7a423c8
[MUSA] enable fp16/fast_fp16/bf16_mma on PH1 (#17551)
|
1 mese fa |
Aman Gupta
|
2e7ef98f18
ggml-cuda: add stricter checking for fusion (#17568)
|
1 mese fa |
Fredrik Hultin
|
ddf9f94389
server : add Anthropic Messages API support (#17570)
|
1 mese fa |
Piotr Wilkin (ilintar)
|
ff55414c42
model : Qwen3 Next (#16095)
|
1 mese fa |
Johannes Gäßler
|
73955f7d2a
CUDA: no FP16 arithmetic for vector FA kernel (#17558)
|
1 mese fa |
Jeff Bolz
|
35cf8887e1
vulkan: Implement GGML_OP_TRI (#17503)
|
1 mese fa |
Radoslav Gerganov
|
15d2b46b4d
rpc : cache and reuse compute graphs (#15405)
|
1 mese fa |
yulo
|
6bca76ff5e
HIP: enable mul_mat_f for RDNA4 (#17437)
|
1 mese fa |
Piotr Wilkin (ilintar)
|
cd0e3a7a3b
SOLVE_TRI CUDA kernel for small matrices (#17457)
|
1 mese fa |
Neo Zhang Jianyu
|
efaaccdd69
refactor pad_reflect_1d to make the UT case pass (#17204)
|
1 mese fa |
Jeff Bolz
|
4abef75f2c
vulkan: Implement SOLVE_TRI (#17486)
|
1 mese fa |
Georgi Gerganov
|
c386114922
arch : add description about LLM_TENSOR_INFOS (#17550)
|
1 mese fa |
Georgi Gerganov
|
6783b11fb0
models : fix LFM2 tensors (#17548)
|
1 mese fa |
matt23654
|
909072abcf
cuda : fix UMA detection on discrete GPUs. (#17537)
|
1 mese fa |
Alberto Cabrera Pérez
|
cd8370b408
ggml-cpu: aarm64: q4_K repack gemm and gemv implementations (dotprod only) (#17494)
|
1 mese fa |
Eric Curtin
|
d21a76ac38
devops: Add build-essential to Ubuntu 26.04 image (#17531)
|
1 mese fa |
Aleksei Nikiforov
|
4fcd87cf7c
gguf-py : skip endian-conversion of MXFP4 data (#17523)
|
1 mese fa |
Acly
|
b78db3bd50
vulkan : move contiguous checks to device_supports_op (#17490)
|
1 mese fa |
Jeff Bolz
|
142df17c9c
vulkan: use a fixed 1KB buffer for the add_rms_fusion opt (#17514)
|
1 mese fa |
Xuan-Son Nguyen
|
e509411cf1
server: enable jinja by default, update docs (#17524)
|
1 mese fa |
lhez
|
7cba58bbea
opencl: add sqr, sqrt, mean and ssm_conv (#17476)
|
1 mese fa |
Alberto Cabrera Pérez
|
5449367b21
Fix chunks being too small with small matrix sizes (#17526)
|
1 mese fa |
Han Qingzhe
|
1d594c295c
clip: (minicpmv) fix resampler kq_scale (#17516)
|
1 mese fa |
Jeff Bolz
|
eec1e33a9e
vulkan: allow graph_optimize for prompt processing workloads (#17475)
|
1 mese fa |
Jeff Bolz
|
879d673759
vulkan: Implement top-k (#17418)
|
1 mese fa |
xctan
|
6ab4e50d9c
ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (#17448)
|
1 mese fa |
Adrien Gallouët
|
2336cc4784
cmake : use EXCLUDE_FROM_ALL to avoid patch-boringssl.cmake (#17520)
|
1 mese fa |
Adrien Gallouët
|
e6923caaec
ggml : fix ARM feature verification (#17519)
|
1 mese fa |