ddh0
|
5b48cd53a8
Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 values (#8058)
|
1 год назад |
Georgi Gerganov
|
6ff13987ad
common : normalize naming style (#7462)
|
1 год назад |
Fred Douglas
|
1ea2a0036e
quantize : fix --keep-split check (#7374)
|
1 год назад |
Justine Tunney
|
3855416027
ggml : introduce bfloat16 support (#6412)
|
1 год назад |
Pierrick Hymbert
|
0c4d489e29
quantize: add imatrix and dataset metadata in GGUF (#6658)
|
1 год назад |
jiez
|
1966eb2615
quantize : add '--keep-split' to quantize model into shards (#6688)
|
1 год назад |
slaren
|
08a0c02060
ggml : mul_mat_id use the same tensor for all the experts (#6387)
|
1 год назад |
Kawrakow
|
55c1b2a3bb
IQ1_M: 1.75 bpw quantization (#6302)
|
1 год назад |
Kawrakow
|
d25b1c31b0
quantize : be able to override metadata by key (#6321)
|
1 год назад |
Kawrakow
|
1d0331c12a
quantize: options for output and token embedding tensors qtype (#6239)
|
1 год назад |
Kawrakow
|
0becb22ac0
IQ4_XS: a 4.25 bpw quantization (#5747)
|
1 год назад |
Kawrakow
|
a33e6a0d2a
Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range (#5721)
|
1 год назад |
Kawrakow
|
4c4cb30736
IQ3_S: a much better alternative to Q3_K (#5676)
|
1 год назад |
Kawrakow
|
a14679cc30
IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590)
|
1 год назад |
Kawrakow
|
bd2d4e393b
1.5 bit quantization (#5453)
|
1 год назад |
bmwl
|
f486f6e1e5
ggml : add numa options (#5377)
|
1 год назад |
Michael Klimenko
|
52bb63c708
refactor : switch to emplace_back to avoid extra object (#5291)
|
1 год назад |
Kawrakow
|
f4d7e54974
SOTA 3-bit quants (#5196)
|
1 год назад |
Vladimir Malyutin
|
7359016c7c
quantize : fix typo (#5211)
|
1 год назад |
Kawrakow
|
66d575c45c
llama : add Q3_K_XS (#5060)
|
2 лет назад |
Kawrakow
|
467a882fd2
Add ability to use importance matrix for all k-quants (#4930)
|
2 лет назад |
Kawrakow
|
147b17ac94
2-bit quantizations (#4897)
|
2 лет назад |
Kawrakow
|
469e75d0a3
llama : restore intended k-quants mixes for MoE models (#4872)
|
2 лет назад |
cebtenzzre
|
b12fa0d1c1
build : link against build info instead of compiling against it (#3879)
|
2 лет назад |
Georgi Gerganov
|
d69d777c02
ggml : quantization refactoring (#3833)
|
2 лет назад |
Cebtenzzre
|
bc39553c90
build : enable more non-default compiler warnings (#3200)
|
2 лет назад |
Cebtenzzre
|
8781013ef6
make : restore build-info.h dependency for several targets (#3205)
|
2 лет назад |
Cebtenzzre
|
e6616cf0db
examples : add compiler version and target to build info (#2998)
|
2 лет назад |
Cebtenzzre
|
3aefaab9e5
check C++ code with -Wmissing-declarations (#3184)
|
2 лет назад |
Cebtenzzre
|
00d62adb79
fix some warnings from gcc and clang-tidy (#3038)
|
2 лет назад |