embeddings.feature 2.4 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
  1. @llama.cpp
  2. @embeddings
  3. Feature: llama.cpp server
  4. Background: Server startup
  5. Given a server listening on localhost:8080
  6. And a model file bert-bge-small/ggml-model-f16.gguf from HF repo ggml-org/models
  7. And a model alias bert-bge-small
  8. And 42 as server seed
  9. And 2 slots
  10. And 1024 as batch size
  11. And 1024 as ubatch size
  12. And 2048 KV cache size
  13. And embeddings extraction
  14. Then the server is starting
  15. Then the server is healthy
  16. Scenario: Embedding
  17. When embeddings are computed for:
  18. """
  19. What is the capital of Bulgaria ?
  20. """
  21. Then embeddings are generated
  22. Scenario: OAI Embeddings compatibility
  23. Given a model bert-bge-small
  24. When an OAI compatible embeddings computation request for:
  25. """
  26. What is the capital of Spain ?
  27. """
  28. Then embeddings are generated
  29. Scenario: OAI Embeddings compatibility with multiple inputs
  30. Given a model bert-bge-small
  31. Given a prompt:
  32. """
  33. In which country Paris is located ?
  34. """
  35. And a prompt:
  36. """
  37. Is Madrid the capital of Spain ?
  38. """
  39. When an OAI compatible embeddings computation request for multiple inputs
  40. Then embeddings are generated
  41. Scenario: Multi users embeddings
  42. Given a prompt:
  43. """
  44. Write a very long story about AI.
  45. """
  46. And a prompt:
  47. """
  48. Write another very long music lyrics.
  49. """
  50. And a prompt:
  51. """
  52. Write a very long poem.
  53. """
  54. And a prompt:
  55. """
  56. Write a very long joke.
  57. """
  58. Given concurrent embedding requests
  59. Then the server is busy
  60. Then the server is idle
  61. Then all embeddings are generated
  62. Scenario: Multi users OAI compatibility embeddings
  63. Given a prompt:
  64. """
  65. In which country Paris is located ?
  66. """
  67. And a prompt:
  68. """
  69. Is Madrid the capital of Spain ?
  70. """
  71. And a prompt:
  72. """
  73. What is the biggest US city ?
  74. """
  75. And a prompt:
  76. """
  77. What is the capital of Bulgaria ?
  78. """
  79. And a model bert-bge-small
  80. Given concurrent OAI embedding requests
  81. Then the server is busy
  82. Then the server is idle
  83. Then all embeddings are generated
  84. Scenario: All embeddings should be the same
  85. Given 10 fixed prompts
  86. And a model bert-bge-small
  87. Given concurrent OAI embedding requests
  88. Then all embeddings are the same