embeddings.feature 2.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
  1. @llama.cpp
  2. @embeddings
  3. Feature: llama.cpp server
  4. Background: Server startup
  5. Given a server listening on localhost:8080
  6. And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf
  7. And a model file ggml-model-f16.gguf
  8. And a model alias bert-bge-small
  9. And 42 as server seed
  10. And 2 slots
  11. And 1024 as batch size
  12. And 1024 as ubatch size
  13. And 2048 KV cache size
  14. And embeddings extraction
  15. Then the server is starting
  16. Then the server is healthy
  17. Scenario: Embedding
  18. When embeddings are computed for:
  19. """
  20. What is the capital of Bulgaria ?
  21. """
  22. Then embeddings are generated
  23. Scenario: OAI Embeddings compatibility
  24. Given a model bert-bge-small
  25. When an OAI compatible embeddings computation request for:
  26. """
  27. What is the capital of Spain ?
  28. """
  29. Then embeddings are generated
  30. Scenario: OAI Embeddings compatibility with multiple inputs
  31. Given a model bert-bge-small
  32. Given a prompt:
  33. """
  34. In which country Paris is located ?
  35. """
  36. And a prompt:
  37. """
  38. Is Madrid the capital of Spain ?
  39. """
  40. When an OAI compatible embeddings computation request for multiple inputs
  41. Then embeddings are generated
  42. Scenario: Multi users embeddings
  43. Given a prompt:
  44. """
  45. Write a very long story about AI.
  46. """
  47. And a prompt:
  48. """
  49. Write another very long music lyrics.
  50. """
  51. And a prompt:
  52. """
  53. Write a very long poem.
  54. """
  55. And a prompt:
  56. """
  57. Write a very long joke.
  58. """
  59. Given concurrent embedding requests
  60. Then the server is busy
  61. Then the server is idle
  62. Then all embeddings are generated
  63. Scenario: Multi users OAI compatibility embeddings
  64. Given a prompt:
  65. """
  66. In which country Paris is located ?
  67. """
  68. And a prompt:
  69. """
  70. Is Madrid the capital of Spain ?
  71. """
  72. And a prompt:
  73. """
  74. What is the biggest US city ?
  75. """
  76. And a prompt:
  77. """
  78. What is the capital of Bulgaria ?
  79. """
  80. And a model bert-bge-small
  81. Given concurrent OAI embedding requests
  82. Then the server is busy
  83. Then the server is idle
  84. Then all embeddings are generated
  85. Scenario: All embeddings should be the same
  86. Given 10 fixed prompts
  87. And a model bert-bge-small
  88. Given concurrent OAI embedding requests
  89. Then all embeddings are the same