Gilad S.
|
fa0465954f
ggml: fix: macOS build with `-DGGML_BACKEND_DL=ON` (#17581)
|
il y a 1 mois |
ddh0
|
5a6241feb0
common: update env var name (#17588)
|
il y a 1 mois |
Aman Gupta
|
c7af376c29
CUDA: add stream-based concurrency (#16991)
|
il y a 1 mois |
Mahekk Shaikh
|
00425e2ed1
cuda : add error checking for cudaMemcpyAsync in argsort (#17599)
|
il y a 1 mois |
Acly
|
385c3da5e6
vulkan : fix FA mask load with bounds check (coopmat2) (#17606)
|
il y a 1 mois |
Xuan-Son Nguyen
|
ab49f094d2
server: move server-context to its own cpp|h (#17595)
|
il y a 1 mois |
Haiyue Wang
|
8c32d9d96d
server: explicitly set the function name in lambda (#17538)
|
il y a 1 mois |
Igor Smirnov
|
0874693b44
common : fix json schema with '\' in literals (#17307)
|
il y a 1 mois |
Neo Zhang
|
7d2add51d8
sycl : support to malloc memory on device more than 4GB, update the doc and script (#17566)
|
il y a 1 mois |
ixgbe
|
f698a79c63
ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567)
|
il y a 1 mois |
Ruben Ortlam
|
47a268ea50
Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900)
|
il y a 1 mois |
Jeff Bolz
|
59d8d4e963
vulkan: improve topk perf for large k, fix overflow in unit tests (#17582)
|
il y a 1 mois |
Aleksei Nikiforov
|
d82b7a7c1d
gguf-py : fix passing non-native endian tensors (editor-gui and new-metadata) (#17553)
|
il y a 1 mois |
DAN™
|
03914c7ef8
common : move all common_chat_parse_* to chat-parser.cpp. (#17481)
|
il y a 1 mois |
o7si
|
3ce7a65c2f
server: fix: /metrics endpoint returning JSON-escaped Prometheus format (#17386)
|
il y a 1 mois |
Diego Devesa
|
e072b2052e
ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276)
|
il y a 1 mois |
R0CKSTAR
|
c6f7a423c8
[MUSA] enable fp16/fast_fp16/bf16_mma on PH1 (#17551)
|
il y a 1 mois |
Aman Gupta
|
2e7ef98f18
ggml-cuda: add stricter checking for fusion (#17568)
|
il y a 1 mois |
Fredrik Hultin
|
ddf9f94389
server : add Anthropic Messages API support (#17570)
|
il y a 1 mois |
Piotr Wilkin (ilintar)
|
ff55414c42
model : Qwen3 Next (#16095)
|
il y a 1 mois |
Johannes Gäßler
|
73955f7d2a
CUDA: no FP16 arithmetic for vector FA kernel (#17558)
|
il y a 1 mois |
Jeff Bolz
|
35cf8887e1
vulkan: Implement GGML_OP_TRI (#17503)
|
il y a 1 mois |
Radoslav Gerganov
|
15d2b46b4d
rpc : cache and reuse compute graphs (#15405)
|
il y a 1 mois |
yulo
|
6bca76ff5e
HIP: enable mul_mat_f for RDNA4 (#17437)
|
il y a 1 mois |
Piotr Wilkin (ilintar)
|
cd0e3a7a3b
SOLVE_TRI CUDA kernel for small matrices (#17457)
|
il y a 1 mois |
Neo Zhang Jianyu
|
efaaccdd69
refactor pad_reflect_1d to make the UT case pass (#17204)
|
il y a 1 mois |
Jeff Bolz
|
4abef75f2c
vulkan: Implement SOLVE_TRI (#17486)
|
il y a 1 mois |
Georgi Gerganov
|
c386114922
arch : add description about LLM_TENSOR_INFOS (#17550)
|
il y a 1 mois |
Georgi Gerganov
|
6783b11fb0
models : fix LFM2 tensors (#17548)
|
il y a 1 mois |
matt23654
|
909072abcf
cuda : fix UMA detection on discrete GPUs. (#17537)
|
il y a 1 mois |