modelcard.template 1.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
  1. ---
  2. base_model:
  3. - {base_model}
  4. ---
  5. # {model_name} GGUF
  6. Recommended way to run this model:
  7. ```sh
  8. llama-server -hf {namespace}/{model_name}-GGUF --embeddings
  9. ```
  10. Then the endpoint can be accessed at http://localhost:8080/embedding, for
  11. example using `curl`:
  12. ```console
  13. curl --request POST \
  14. --url http://localhost:8080/embedding \
  15. --header "Content-Type: application/json" \
  16. --data '{{"input": "Hello embeddings"}}' \
  17. --silent
  18. ```
  19. Alternatively, the `llama-embedding` command line tool can be used:
  20. ```sh
  21. llama-embedding -hf {namespace}/{model_name}-GGUF --verbose-prompt -p "Hello embeddings"
  22. ```
  23. #### embd_normalize
  24. When a model uses pooling, or the pooling method is specified using `--pooling`,
  25. the normalization can be controlled by the `embd_normalize` parameter.
  26. The default value is `2` which means that the embeddings are normalized using
  27. the Euclidean norm (L2). Other options are:
  28. * -1 No normalization
  29. * 0 Max absolute
  30. * 1 Taxicab
  31. * 2 Euclidean/L2
  32. * \>2 P-Norm
  33. This can be passed in the request body to `llama-server`, for example:
  34. ```sh
  35. --data '{{"input": "Hello embeddings", "embd_normalize": -1}}' \
  36. ```
  37. And for `llama-embedding`, by passing `--embd-normalize <value>`, for example:
  38. ```sh
  39. llama-embedding -hf {namespace}/{model_name}-GGUF --embd-normalize -1 -p "Hello embeddings"
  40. ```