Xuan-Son Nguyen edb18b6e8f clip : fix pixtral on some GPU backends (#13097) před 8 měsíci
..
android 243453533e llava : update documentations (#13055) před 9 měsíci
CMakeLists.txt 84a9bf2fc2 mtmd : merge llava, gemma3 and minicpmv CLI into single `llama-mtmd-cli` (#13012) před 9 měsíci
README-quantize.md 1ec208083c llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644) před 11 měsíci
README.md ecda2ec4b3 mtmd : Support Pixtral 12B (#13065) před 8 měsíci
clip-impl.h 13be08daf9 clip : remove boi/eoi embeddings for GLM-edge model (#13081) před 8 měsíci
clip-quantize-cli.cpp 1ec208083c llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644) před 11 měsíci
clip.cpp edb18b6e8f clip : fix pixtral on some GPU backends (#13097) před 8 měsíci
clip.h 6602304814 llava: fix errors in clip.h on certain compilers (#13030) před 9 měsíci
convert_image_encoder_to_gguf.py e9b2f84f14 llava: add big-endian conversion for image encoder (#12218) před 10 měsíci
deprecation-warning.cpp 84a9bf2fc2 mtmd : merge llava, gemma3 and minicpmv CLI into single `llama-mtmd-cli` (#13012) před 9 měsíci
glmedge-convert-image-encoder-to-gguf.py 0cec062a63 llama : add support for GLM-Edge and GLM-Edge-V series models (#10573) před 11 měsíci
glmedge-surgery.py 0cec062a63 llama : add support for GLM-Edge and GLM-Edge-V series models (#10573) před 11 měsíci
llava.cpp 0c50923944 clip : use smart pointer (⚠️ breaking change) (#12869) před 9 měsíci
llava.h 3071c0a5f2 llava : support MiniCPM-V-2.5 (#7599) před 1 rokem
llava_surgery.py e235b267a2 py : switch to snake_case (#8305) před 1 rokem
llava_surgery_v2.py 7a2c913e66 llava : Add Granite Vision Support (#11794) před 10 měsíci
minicpmv-convert-image-encoder-to-gguf.py 8352cdc87b llava : fix bug in minicpm-v code (#11513) před 10 měsíci
minicpmv-surgery.py 3e3357fd77 llava : support Minicpm-omni (#11289) před 1 rokem
mtmd-cli.cpp 7c727fbe39 arg : add --no-mmproj-offload (#13093) před 8 měsíci
mtmd.cpp 13be08daf9 clip : remove boi/eoi embeddings for GLM-edge model (#13081) před 8 měsíci
mtmd.h b9154ecff9 mtmd : add methods to access `mtmd_image_tokens` (#12906) před 9 měsíci
qwen2_vl_surgery.py 4ddd199f6f llava : Allow locally downloaded models for QwenVL (#10833) před 1 rokem
qwen2vl-cli.cpp 0364178ca2 clip : refactor clip_init, add tests (#12757) před 9 měsíci
requirements.txt d3ae0ee8d7 py : fix requirements check '==' -> '~=' (#8982) před 1 rokem
test-1.jpeg 0364178ca2 clip : refactor clip_init, add tests (#12757) před 9 měsíci
tests.sh ecda2ec4b3 mtmd : Support Pixtral 12B (#13065) před 8 měsíci

README-quantize.md

Quantizing CLIP Visual Projector

This is the tool for quantizing the CLIP visual projector model. Quantization reduces the precision of the model's weights, which can significantly decrease the model size and improve inference speed, often with minimal impact on performance.

Usage

To quantize a CLIP visual projector model, use the following command:

./bin/llama-llava-clip-quantize-cli /path/to/ggml-model-f32.gguf /path/to/ggml-model-quantized.gguf <type>

After the quantization, the visual projector can be used freely with the existing LLAVA cli (LLAVA, Qwen2VL, etc).

Arguments

  • /path/to/ggml-model-f32.gguf: The path to the input model file in FP32 or FP16 format.
  • /path/to/ggml-model-quantized.gguf: The path where the quantized model will be saved.
  • <type>: The quantization type to apply. This should be an integer corresponding to one of the quantization types defined in the enum ggml_type.

Quantization Types

The following quantization types are supported, based on the enum ggml_type definition:

  • 2 - q4_0: 4-bit quantization with a single scale value.
  • 3 - q4_1: 4-bit quantization with a separate scale value for each block.
  • 6 - q5_0: 5-bit quantization with a single scale value.
  • 7 - q5_1: 5-bit quantization with a separate scale value for each block.
  • 8 - q8_0: 8-bit quantization with a single scale value.

Example

To quantize a model using the q4_0 quantization type, you would run:

./bin/llama-llava-clip-quantize-cli /path/to/ggml-model-f32.gguf /path/to/ggml-model-quantized.gguf 2

This command will generate a quantized model at /path/to/ggml-model-quantized.gguf using the q4_0 quantization method.

Notes

  • Quantization can lead to a loss in model accuracy, depending on the chosen quantization type. It is recommended to evaluate the quantized model's performance on your specific task to ensure it meets your requirements.
  • The quantized model will typically be smaller in size and faster to run, making it more suitable for deployment in resource-constrained environments.