City c104023994 mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459) 8 miesięcy temu
..
android 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
CMakeLists.txt a634d75d1b mtmd : move helpers to dedicated file (#13442) 8 miesięcy temu
README-quantize.md 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
README.md 053367d149 mtmd : support InternVL 2.5 and 3 (#13422) 8 miesięcy temu
clip-impl.h 3eac209319 mtmd : support InternVL 3 38B and 78B mmproj (#13443) 8 miesięcy temu
clip-quantize-cli.cpp 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
clip.cpp c104023994 mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459) 8 miesięcy temu
clip.h 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
convert_image_encoder_to_gguf.py 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
deprecation-warning.cpp 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
glmedge-convert-image-encoder-to-gguf.py 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
glmedge-surgery.py 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
llava.cpp 27ebfcacba llama : do not crash if there is no CPU backend (#13395) 8 miesięcy temu
llava.h 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
llava_surgery.py 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
llava_surgery_v2.py 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
minicpmv-convert-image-encoder-to-gguf.py 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
minicpmv-surgery.py 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
mtmd-cli.cpp 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
mtmd-helper.cpp a634d75d1b mtmd : move helpers to dedicated file (#13442) 8 miesięcy temu
mtmd.cpp a634d75d1b mtmd : move helpers to dedicated file (#13442) 8 miesięcy temu
mtmd.h a634d75d1b mtmd : move helpers to dedicated file (#13442) 8 miesięcy temu
qwen2vl-test.cpp 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
requirements.txt 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
test-1.jpeg 9b61acf060 mtmd : rename llava directory to mtmd (#13311) 8 miesięcy temu
tests.sh 053367d149 mtmd : support InternVL 2.5 and 3 (#13422) 8 miesięcy temu

README-quantize.md

Quantizing CLIP Visual Projector

This is the tool for quantizing the CLIP visual projector model. Quantization reduces the precision of the model's weights, which can significantly decrease the model size and improve inference speed, often with minimal impact on performance.

Usage

To quantize a CLIP visual projector model, use the following command:

./bin/llama-llava-clip-quantize-cli /path/to/ggml-model-f32.gguf /path/to/ggml-model-quantized.gguf <type>

After the quantization, the visual projector can be used freely with the existing LLAVA cli (LLAVA, Qwen2VL, etc).

Arguments

  • /path/to/ggml-model-f32.gguf: The path to the input model file in FP32 or FP16 format.
  • /path/to/ggml-model-quantized.gguf: The path where the quantized model will be saved.
  • <type>: The quantization type to apply. This should be an integer corresponding to one of the quantization types defined in the enum ggml_type.

Quantization Types

The following quantization types are supported, based on the enum ggml_type definition:

  • 2 - q4_0: 4-bit quantization with a single scale value.
  • 3 - q4_1: 4-bit quantization with a separate scale value for each block.
  • 6 - q5_0: 5-bit quantization with a single scale value.
  • 7 - q5_1: 5-bit quantization with a separate scale value for each block.
  • 8 - q8_0: 8-bit quantization with a single scale value.

Example

To quantize a model using the q4_0 quantization type, you would run:

./bin/llama-llava-clip-quantize-cli /path/to/ggml-model-f32.gguf /path/to/ggml-model-quantized.gguf 2

This command will generate a quantized model at /path/to/ggml-model-quantized.gguf using the q4_0 quantization method.

Notes

  • Quantization can lead to a loss in model accuracy, depending on the chosen quantization type. It is recommended to evaluate the quantized model's performance on your specific task to ensure it meets your requirements.
  • The quantized model will typically be smaller in size and faster to run, making it more suitable for deployment in resource-constrained environments.