Aman Gupta
|
6d758839ff
Add LLaDA-7b-MoE diffusion model (#16003)
|
4 months ago |
Jake Karnes
|
3d4053f77f
CUDA: fix im2col_3d to respect non-contiguous inputs (views) (#15956)
|
4 months ago |
Diego Devesa
|
dc381aa9a6
docker : enable rocWMMA in ROCm images, add gfx1151 (#15997)
|
4 months ago |
Diego Devesa
|
10d197409b
releases : switch to rocWMMA develop branch, add gfx1151 (#15992)
|
4 months ago |
yael-works
|
b907255f4b
SYCL: Add COUNT_EQUAL operator support (#15991)
|
4 months ago |
Nikolay Popov
|
28c39da7c6
llama-run: Fix model download on Windows (#15988)
|
4 months ago |
Aman Gupta
|
106220562a
CUDA: some micro-optimizations in mmf.cuh for mul_mat_id (#15926)
|
4 months ago |
ddh0
|
a68f31edd7
fix KLD percentile output (#15999)
|
4 months ago |
Sigbjørn Skjæret
|
b8e09f08b9
model : add grok-2 support (#15539)
|
4 months ago |
Sigbjørn Skjæret
|
6c019cb04e
server : only attempt to enable thinking if using jinja (#15967)
|
4 months ago |
Georgi Gerganov
|
9dcd200d57
metal : remove memory pools (#15966)
|
4 months ago |
Adam
|
0fa154e350
rocm.Dockerfile: added gfx1200,gfx1201 architectures to support AMD Radeon RX 9000 series (#15994)
|
4 months ago |
Ruben Ortlam
|
261e6a20ff
Vulkan: Clean up mul_mm shader (#15987)
|
4 months ago |
lcy
|
a0e13dcbe5
build: fix the build failures of Windows HIP release job (#15984)
|
4 months ago |
Georgi Gerganov
|
a14bd35014
metal : fix kernel requirements (#15983)
|
4 months ago |
Radoslav Gerganov
|
918b26f197
rpc : fix regression when --device is used (#15981)
|
4 months ago |
Diego Devesa
|
9ecb884346
releases : update ROCM, add gfx1200, gfx1201, gfx1151 (#15972)
|
4 months ago |
Radoslav Gerganov
|
d1c6f11f47
doc : update documentation for --tensor-split (#15980)
|
4 months ago |
Aaron Teo
|
6380d6a3e7
ggml-zdnn: rm user mapped buffers (#15965)
|
4 months ago |
Jeff Bolz
|
aa0c461efe
vulkan: fix failing dequant shaders (#15862)
|
4 months ago |
Jeff Bolz
|
b9c9c9f789
vulkan: initialize vulkan-hpp to allow using extension function pointers (#15705)
|
4 months ago |
Diego Devesa
|
50f4281a6f
llama : allow using iGPUs with --device (#15951)
|
4 months ago |
Georgi Gerganov
|
55758b00ca
metal : refactor kernel loading (#15964)
|
4 months ago |
Georgi Gerganov
|
f161463a54
metal : allow ops to run concurrently (#15929)
|
4 months ago |
Georgi Gerganov
|
84d7b2fca1
metal : fix memory leaks (#15962)
|
4 months ago |
Aaron Teo
|
40be51152d
ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free (#15839)
|
4 months ago |
Eric Curtin
|
4bf5549269
Add docker protocol support for llama-server model loading (#15790)
|
4 months ago |
Haiyue Wang
|
f4e664f838
context : remove redundant explicit casting to the same type (#15948)
|
4 months ago |
Georgi Gerganov
|
f088b6a84f
server : adjust prompt similarity thold + add logs (#15913)
|
4 months ago |
Ruben Ortlam
|
304ac5693d
Vulkan iGPU device selection overhaul and PCI ID API support (#15947)
|
4 months ago |