slaren
|
63351143b2
quantize : improve type name parsing (#9570)
|
1 год назад |
compilade
|
9bc6db28d0
ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151)
|
1 год назад |
João Dinis Ferreira
|
8f824ffe8e
quantize : fix typo in usage help of `quantize.cpp` (#9145)
|
1 год назад |
Daniel Bevenius
|
725e3d9437
quantize : update usage comment in quantize.cpp (#8889)
|
1 год назад |
Georgi Gerganov
|
0efec57787
llama : valign + remove unused ftype (#8502)
|
1 год назад |
Dibakar Gope
|
0f1a39f343
ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780)
|
1 год назад |
ddh0
|
5b48cd53a8
Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 values (#8058)
|
1 год назад |
Georgi Gerganov
|
6ff13987ad
common : normalize naming style (#7462)
|
1 год назад |
Fred Douglas
|
1ea2a0036e
quantize : fix --keep-split check (#7374)
|
1 год назад |
Justine Tunney
|
3855416027
ggml : introduce bfloat16 support (#6412)
|
1 год назад |
Pierrick Hymbert
|
0c4d489e29
quantize: add imatrix and dataset metadata in GGUF (#6658)
|
1 год назад |
jiez
|
1966eb2615
quantize : add '--keep-split' to quantize model into shards (#6688)
|
1 год назад |
slaren
|
08a0c02060
ggml : mul_mat_id use the same tensor for all the experts (#6387)
|
1 год назад |
Kawrakow
|
55c1b2a3bb
IQ1_M: 1.75 bpw quantization (#6302)
|
1 год назад |
Kawrakow
|
d25b1c31b0
quantize : be able to override metadata by key (#6321)
|
1 год назад |
Kawrakow
|
1d0331c12a
quantize: options for output and token embedding tensors qtype (#6239)
|
1 год назад |
Kawrakow
|
0becb22ac0
IQ4_XS: a 4.25 bpw quantization (#5747)
|
1 год назад |
Kawrakow
|
a33e6a0d2a
Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range (#5721)
|
1 год назад |
Kawrakow
|
4c4cb30736
IQ3_S: a much better alternative to Q3_K (#5676)
|
1 год назад |
Kawrakow
|
a14679cc30
IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590)
|
1 год назад |
Kawrakow
|
bd2d4e393b
1.5 bit quantization (#5453)
|
1 год назад |
bmwl
|
f486f6e1e5
ggml : add numa options (#5377)
|
1 год назад |
Michael Klimenko
|
52bb63c708
refactor : switch to emplace_back to avoid extra object (#5291)
|
1 год назад |
Kawrakow
|
f4d7e54974
SOTA 3-bit quants (#5196)
|
1 год назад |
Vladimir Malyutin
|
7359016c7c
quantize : fix typo (#5211)
|
1 год назад |
Kawrakow
|
66d575c45c
llama : add Q3_K_XS (#5060)
|
2 лет назад |
Kawrakow
|
467a882fd2
Add ability to use importance matrix for all k-quants (#4930)
|
2 лет назад |
Kawrakow
|
147b17ac94
2-bit quantizations (#4897)
|
2 лет назад |
Kawrakow
|
469e75d0a3
llama : restore intended k-quants mixes for MoE models (#4872)
|
2 лет назад |
cebtenzzre
|
b12fa0d1c1
build : link against build info instead of compiling against it (#3879)
|
2 лет назад |