embeddings.feature 2.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
  1. @llama.cpp
  2. @embeddings
  3. Feature: llama.cpp server
  4. Background: Server startup
  5. Given a server listening on localhost:8080
  6. And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf
  7. And a model file bert-bge-small.gguf
  8. And a model alias bert-bge-small
  9. And 42 as server seed
  10. And 2 slots
  11. # the bert-bge-small model has context size of 512
  12. # since the generated prompts are as big as the batch size, we need to set the batch size to 512
  13. # ref: https://huggingface.co/BAAI/bge-small-en-v1.5/blob/5c38ec7c405ec4b44b94cc5a9bb96e735b38267a/config.json#L20
  14. And 512 as batch size
  15. And 512 as ubatch size
  16. And 2048 KV cache size
  17. And embeddings extraction
  18. Then the server is starting
  19. Then the server is healthy
  20. Scenario: Embedding
  21. When embeddings are computed for:
  22. """
  23. What is the capital of Bulgaria ?
  24. """
  25. Then embeddings are generated
  26. Scenario: OAI Embeddings compatibility
  27. Given a model bert-bge-small
  28. When an OAI compatible embeddings computation request for:
  29. """
  30. What is the capital of Spain ?
  31. """
  32. Then embeddings are generated
  33. Scenario: OAI Embeddings compatibility with multiple inputs
  34. Given a model bert-bge-small
  35. Given a prompt:
  36. """
  37. In which country Paris is located ?
  38. """
  39. And a prompt:
  40. """
  41. Is Madrid the capital of Spain ?
  42. """
  43. When an OAI compatible embeddings computation request for multiple inputs
  44. Then embeddings are generated
  45. Scenario: Multi users embeddings
  46. Given a prompt:
  47. """
  48. Write a very long story about AI.
  49. """
  50. And a prompt:
  51. """
  52. Write another very long music lyrics.
  53. """
  54. And a prompt:
  55. """
  56. Write a very long poem.
  57. """
  58. And a prompt:
  59. """
  60. Write a very long joke.
  61. """
  62. Given concurrent embedding requests
  63. Then the server is busy
  64. Then the server is idle
  65. Then all embeddings are generated
  66. Scenario: Multi users OAI compatibility embeddings
  67. Given a prompt:
  68. """
  69. In which country Paris is located ?
  70. """
  71. And a prompt:
  72. """
  73. Is Madrid the capital of Spain ?
  74. """
  75. And a prompt:
  76. """
  77. What is the biggest US city ?
  78. """
  79. And a prompt:
  80. """
  81. What is the capital of Bulgaria ?
  82. """
  83. And a model bert-bge-small
  84. Given concurrent OAI embedding requests
  85. Then the server is busy
  86. Then the server is idle
  87. Then all embeddings are generated
  88. Scenario: All embeddings should be the same
  89. Given 10 fixed prompts
  90. And a model bert-bge-small
  91. Given concurrent OAI embedding requests
  92. Then all embeddings are the same