slaren
|
0d56246f4b
ggml : group all experts in a single ggml_mul_mat_id (#6505)
|
hai 1 ano |
Georgi Gerganov
|
666867b799
ggml : fix llamafile sgemm wdata offsets (#6710)
|
hai 1 ano |
Justine Tunney
|
8cc91dc63c
ggml : add llamafile sgemm (#6414)
|
hai 1 ano |
slaren
|
fbbc030ba9
metal : unify mul_mv_id kernels (#6556)
|
hai 1 ano |
jiez
|
91c736015b
llama : add gguf_remove_key + remove split meta during quantize (#6591)
|
hai 1 ano |
Carolinabanana
|
5dc9dd7152
llama : add Command R Plus support (#6491)
|
hai 1 ano |
slaren
|
08a0c02060
ggml : mul_mat_id use the same tensor for all the experts (#6387)
|
hai 1 ano |
0cc4m
|
ba0c7c70ab
Vulkan k-quant mmq and ggml-backend offload functionality (#6155)
|
hai 1 ano |
slaren
|
e5b89a441a
ggml : fix bounds checking of zero size views (#6347)
|
hai 1 ano |
compilade
|
557410b8f0
llama : greatly reduce output buffer memory usage (#6122)
|
hai 1 ano |
Kawrakow
|
55c1b2a3bb
IQ1_M: 1.75 bpw quantization (#6302)
|
hai 1 ano |
slaren
|
280345968d
cuda : rename build flag to LLAMA_CUDA (#6299)
|
hai 1 ano |
Rick G
|
a32b77c4b2
Fix heap corruption from wmode out-of-bound writes on windows (#6272)
|
hai 1 ano |
Meng, Hengyu
|
ddf6568510
[SYCL] offload op (#6217)
|
hai 1 ano |
Jared Van Bortel
|
94d1b3b411
use _wfopen instead of fopen on Windows (#6248)
|
hai 1 ano |
slaren
|
2bf8d0f7c4
backend : offload large batches to GPU (#6083)
|
hai 1 ano |
AmirAli Mirian
|
c47cf414ef
ggml : add AVX512F SIMD (#6088)
|
hai 1 ano |
Ondřej Čertík
|
7ce2c77f88
gguf : add support for I64 and F64 arrays (#6062)
|
hai 1 ano |
slaren
|
f30ea47a87
llama : add pipeline parallelism support (#6017)
|
hai 1 ano |
Michael Podvitskiy
|
3202361c5b
ggml, ci : Windows ARM runner and build fixes (#5979)
|
hai 1 ano |
Georgi Gerganov
|
5b09797321
ggml : remove old quantization functions (#5942)
|
hai 1 ano |
compilade
|
c2101a2e90
llama : support Mamba Selective State Space Models (#5328)
|
hai 1 ano |
Jared Van Bortel
|
e04e04f8fa
ggml : use SYS_get_cpu if SYS_getcpu is not defined (#5906)
|
hai 1 ano |
Georgi Gerganov
|
a1c6d96ed8
ggml : fix unknown status (#0)
|
hai 1 ano |
Michael Podvitskiy
|
9fa2627347
ggml : introduce ggml_status (ggml/750)
|
hai 1 ano |
leejet
|
7d43c585dc
add some new ops, fix some operators and add batch operations to certain operators. (ggml/747)
|
hai 1 ano |
slaren
|
2774b0c974
add google magika inference example (ggml/748)
|
hai 1 ano |
UEXTM.com
|
5f70671856
Introduce backend GUIDs (ggml/743)
|
hai 1 ano |
Kawrakow
|
7c4263d426
ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760)
|
hai 1 ano |
Kawrakow
|
0becb22ac0
IQ4_XS: a 4.25 bpw quantization (#5747)
|
hai 1 ano |