embeddings.feature 3.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
  1. @llama.cpp
  2. @embeddings
  3. Feature: llama.cpp server
  4. Background: Server startup
  5. Given a server listening on localhost:8080
  6. And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf
  7. And a model file bert-bge-small.gguf
  8. And a model alias bert-bge-small
  9. And 42 as server seed
  10. And 2 slots
  11. # the bert-bge-small model has context size of 512
  12. # since the generated prompts are as big as the batch size, we need to set the batch size to <= 512
  13. # ref: https://huggingface.co/BAAI/bge-small-en-v1.5/blob/5c38ec7c405ec4b44b94cc5a9bb96e735b38267a/config.json#L20
  14. And 128 as batch size
  15. And 128 as ubatch size
  16. And 512 KV cache size
  17. And enable embeddings endpoint
  18. Then the server is starting
  19. Then the server is healthy
  20. Scenario: Embedding
  21. When embeddings are computed for:
  22. """
  23. What is the capital of Bulgaria ?
  24. """
  25. Then embeddings are generated
  26. Scenario: Embedding (error: prompt too long)
  27. When embeddings are computed for:
  28. """
  29. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  30. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
  31. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
  32. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
  33. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  34. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
  35. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
  36. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
  37. """
  38. And embeddings request with 500 api error
  39. Scenario: OAI Embeddings compatibility
  40. Given a model bert-bge-small
  41. When an OAI compatible embeddings computation request for:
  42. """
  43. What is the capital of Spain ?
  44. """
  45. Then embeddings are generated
  46. Scenario: OAI Embeddings compatibility with multiple inputs
  47. Given a model bert-bge-small
  48. Given a prompt:
  49. """
  50. In which country Paris is located ?
  51. """
  52. And a prompt:
  53. """
  54. Is Madrid the capital of Spain ?
  55. """
  56. When an OAI compatible embeddings computation request for multiple inputs
  57. Then embeddings are generated
  58. Scenario: Multi users embeddings
  59. Given a prompt:
  60. """
  61. Write a very long story about AI.
  62. """
  63. And a prompt:
  64. """
  65. Write another very long music lyrics.
  66. """
  67. And a prompt:
  68. """
  69. Write a very long poem.
  70. """
  71. And a prompt:
  72. """
  73. Write a very long joke.
  74. """
  75. Given concurrent embedding requests
  76. Then the server is busy
  77. Then the server is idle
  78. Then all embeddings are generated
  79. Scenario: Multi users OAI compatibility embeddings
  80. Given a prompt:
  81. """
  82. In which country Paris is located ?
  83. """
  84. And a prompt:
  85. """
  86. Is Madrid the capital of Spain ?
  87. """
  88. And a prompt:
  89. """
  90. What is the biggest US city ?
  91. """
  92. And a prompt:
  93. """
  94. What is the capital of Bulgaria ?
  95. """
  96. And a model bert-bge-small
  97. Given concurrent OAI embedding requests
  98. Then the server is busy
  99. Then the server is idle
  100. Then all embeddings are generated
  101. Scenario: All embeddings should be the same
  102. Given 10 fixed prompts
  103. And a model bert-bge-small
  104. Given concurrent OAI embedding requests
  105. Then all embeddings are the same