| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748 |
- ---
- base_model:
- - {base_model}
- ---
- # {model_name} GGUF
- Recommended way to run this model:
- ```sh
- llama-server -hf {namespace}/{model_name}-GGUF --embeddings
- ```
- Then the endpoint can be accessed at http://localhost:8080/embedding, for
- example using `curl`:
- ```console
- curl --request POST \
- --url http://localhost:8080/embedding \
- --header "Content-Type: application/json" \
- --data '{{"input": "Hello embeddings"}}' \
- --silent
- ```
- Alternatively, the `llama-embedding` command line tool can be used:
- ```sh
- llama-embedding -hf {namespace}/{model_name}-GGUF --verbose-prompt -p "Hello embeddings"
- ```
- #### embd_normalize
- When a model uses pooling, or the pooling method is specified using `--pooling`,
- the normalization can be controlled by the `embd_normalize` parameter.
- The default value is `2` which means that the embeddings are normalized using
- the Euclidean norm (L2). Other options are:
- * -1 No normalization
- * 0 Max absolute
- * 1 Taxicab
- * 2 Euclidean/L2
- * \>2 P-Norm
- This can be passed in the request body to `llama-server`, for example:
- ```sh
- --data '{{"input": "Hello embeddings", "embd_normalize": -1}}' \
- ```
- And for `llama-embedding`, by passing `--embd-normalize <value>`, for example:
- ```sh
- llama-embedding -hf {namespace}/{model_name}-GGUF --embd-normalize -1 -p "Hello embeddings"
- ```
|