Xuan-Son Nguyen 074e42ab31 convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (#13209)		8 месяцев назад
..
android	243453533e llava : update documentations (#13055)	9 месяцев назад
CMakeLists.txt	00e3e5a194 mtmd : add qwen2vl and qwen2.5vl (#13141)	8 месяцев назад
README-quantize.md	1ec208083c llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644)	11 месяцев назад
README.md	074e42ab31 convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (#13209)	8 месяцев назад
clip-impl.h	8936784f7a mtmd : add vision support for Mistral Small 3.1 (#13231)	8 месяцев назад
clip-quantize-cli.cpp	1ec208083c llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644)	11 месяцев назад
clip.cpp	b6e4ff69b8 clip : (minicpmv) Re-enable upscaling of images smaller than the CLIP image size (#13237)	8 месяцев назад
clip.h	00e3e5a194 mtmd : add qwen2vl and qwen2.5vl (#13141)	8 месяцев назад
convert_image_encoder_to_gguf.py	e9b2f84f14 llava: add big-endian conversion for image encoder (#12218)	10 месяцев назад
deprecation-warning.cpp	84a9bf2fc2 mtmd : merge llava, gemma3 and minicpmv CLI into single `llama-mtmd-cli` (#13012)	9 месяцев назад
glmedge-convert-image-encoder-to-gguf.py	0cec062a63 llama : add support for GLM-Edge and GLM-Edge-V series models (#10573)	11 месяцев назад
glmedge-surgery.py	0cec062a63 llama : add support for GLM-Edge and GLM-Edge-V series models (#10573)	11 месяцев назад
llava.cpp	00e3e5a194 mtmd : add qwen2vl and qwen2.5vl (#13141)	8 месяцев назад
llava.h	3071c0a5f2 llava : support MiniCPM-V-2.5 (#7599)	1 год назад
llava_surgery.py	e235b267a2 py : switch to snake_case (#8305)	1 год назад
llava_surgery_v2.py	7a2c913e66 llava : Add Granite Vision Support (#11794)	10 месяцев назад
minicpmv-convert-image-encoder-to-gguf.py	8352cdc87b llava : fix bug in minicpm-v code (#11513)	10 месяцев назад
minicpmv-surgery.py	3e3357fd77 llava : support Minicpm-omni (#11289)	1 год назад
mtmd-cli.cpp	e84773ab60 mtmd-cli : fix out_of_range when input image path is empty (#13244)	8 месяцев назад
mtmd.cpp	e84773ab60 mtmd-cli : fix out_of_range when input image path is empty (#13244)	8 месяцев назад
mtmd.h	00e3e5a194 mtmd : add qwen2vl and qwen2.5vl (#13141)	8 месяцев назад
qwen2vl-test.cpp	00e3e5a194 mtmd : add qwen2vl and qwen2.5vl (#13141)	8 месяцев назад
requirements.txt	d3ae0ee8d7 py : fix requirements check '==' -> '~=' (#8982)	1 год назад
test-1.jpeg	0364178ca2 clip : refactor clip_init, add tests (#12757)	9 месяцев назад
tests.sh	074e42ab31 convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (#13209)	8 месяцев назад

Quantizing CLIP Visual Projector

This is the tool for quantizing the CLIP visual projector model. Quantization reduces the precision of the model's weights, which can significantly decrease the model size and improve inference speed, often with minimal impact on performance.

Usage

To quantize a CLIP visual projector model, use the following command:

./bin/llama-llava-clip-quantize-cli /path/to/ggml-model-f32.gguf /path/to/ggml-model-quantized.gguf <type>

After the quantization, the visual projector can be used freely with the existing LLAVA cli (LLAVA, Qwen2VL, etc).

Arguments

/path/to/ggml-model-f32.gguf: The path to the input model file in FP32 or FP16 format.
/path/to/ggml-model-quantized.gguf: The path where the quantized model will be saved.
<type>: The quantization type to apply. This should be an integer corresponding to one of the quantization types defined in the enum ggml_type.

Quantization Types

The following quantization types are supported, based on the enum ggml_type definition:

2 - q4_0: 4-bit quantization with a single scale value.
3 - q4_1: 4-bit quantization with a separate scale value for each block.
6 - q5_0: 5-bit quantization with a single scale value.
7 - q5_1: 5-bit quantization with a separate scale value for each block.
8 - q8_0: 8-bit quantization with a single scale value.

Example

To quantize a model using the q4_0 quantization type, you would run:

./bin/llama-llava-clip-quantize-cli /path/to/ggml-model-f32.gguf /path/to/ggml-model-quantized.gguf 2

This command will generate a quantized model at /path/to/ggml-model-quantized.gguf using the q4_0 quantization method.

Notes

Quantization can lead to a loss in model accuracy, depending on the chosen quantization type. It is recommended to evaluate the quantized model's performance on your specific task to ensure it meets your requirements.
The quantized model will typically be smaller in size and faster to run, making it more suitable for deployment in resource-constrained environments.