Kaynağa Gözat

Add quantize script for batch quantization (#92)

* Add quantize script for batch quantization

* Indentation

* README for new quantize.sh

* Fix script name

* Fix file list on Mac OS

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Pavol Rusnak 2 yıl önce
ebeveyn
işleme
d1f224712d
2 değiştirilmiş dosya ile 18 ekleme ve 31 silme
  1. 3 31
      README.md
  2. 15 0
      quantize.sh

+ 3 - 31
README.md

@@ -145,44 +145,16 @@ python3 -m pip install torch numpy sentencepiece
 python3 convert-pth-to-ggml.py models/7B/ 1
 python3 convert-pth-to-ggml.py models/7B/ 1
 
 
 # quantize the model to 4-bits
 # quantize the model to 4-bits
-./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
+./quantize.sh 7B
 
 
 # run the inference
 # run the inference
 ./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
 ./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
 ```
 ```
 
 
-For the bigger models, there are a few extra quantization steps. For example, for LLaMA-13B, converting to FP16 format
-will create 2 ggml files, instead of one:
-
-```bash
-ggml-model-f16.bin
-ggml-model-f16.bin.1
-```
-
-You need to quantize each of them separately like this:
-
-```bash
-./quantize ./models/13B/ggml-model-f16.bin   ./models/13B/ggml-model-q4_0.bin 2
-./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
-```
-
-Everything else is the same. Simply run:
-
-```bash
-./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 128
-```
-
-The number of files generated for each model is as follows:
-
-```
-7B  -> 1 file
-13B -> 2 files
-30B -> 4 files
-65B -> 8 files
-```
-
 When running the larger models, make sure you have enough disk space to store all the intermediate files.
 When running the larger models, make sure you have enough disk space to store all the intermediate files.
 
 
+TODO: add model disk/mem requirements
+
 ### Interactive mode
 ### Interactive mode
 
 
 If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter.
 If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter.

+ 15 - 0
quantize.sh

@@ -0,0 +1,15 @@
+#!/usr/bin/env bash
+
+if ! [[ "$1" =~ ^[0-9]{1,2}B$ ]]; then
+    echo
+    echo "Usage: quantize.sh 7B|13B|30B|65B [--remove-f16]"
+    echo
+    exit 1
+fi
+
+for i in `ls models/$1/ggml-model-f16.bin*`; do
+    ./quantize "$i" "${i/f16/q4_0}" 2
+    if [[ "$2" == "--remove-f16" ]]; then
+        rm "$i"
+    fi
+done