server.feature 2.9 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
  1. @llama.cpp
  2. Feature: llama.cpp server
  3. Background: Server startup
  4. Given a server listening on localhost:8080
  5. And a model file stories260K.gguf
  6. And a model alias tinyllama-2
  7. And 42 as server seed
  8. # KV Cache corresponds to the total amount of tokens
  9. # that can be stored across all independent sequences: #4130
  10. # see --ctx-size and #5568
  11. And 32 KV cache size
  12. And 1 slots
  13. And embeddings extraction
  14. And 32 server max tokens to predict
  15. Then the server is starting
  16. Then the server is healthy
  17. Scenario: Health
  18. Then the server is ready
  19. And all slots are idle
  20. Scenario Outline: Completion
  21. Given a prompt <prompt>
  22. And <n_predict> max tokens to predict
  23. And a completion request with no api error
  24. Then <n_predicted> tokens are predicted matching <re_content>
  25. Examples: Prompts
  26. | prompt | n_predict | re_content | n_predicted |
  27. | I believe the meaning of life is | 8 | read | 8 |
  28. | Write a joke about AI | 64 | (park<or>friends<or>scared)+ | 32 |
  29. Scenario Outline: OAI Compatibility
  30. Given a model <model>
  31. And a system prompt <system_prompt>
  32. And a user prompt <user_prompt>
  33. And <max_tokens> max tokens to predict
  34. And streaming is <enable_streaming>
  35. Given an OAI compatible chat completions request with no api error
  36. Then <n_predicted> tokens are predicted matching <re_content>
  37. Examples: Prompts
  38. | model | system_prompt | user_prompt | max_tokens | re_content | n_predicted | enable_streaming |
  39. | llama-2 | Book | What is the best book | 8 | (Mom<or>what)+ | 8 | disabled |
  40. | codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 64 | (thanks<or>happy<or>bird)+ | 32 | enabled |
  41. Scenario: Embedding
  42. When embeddings are computed for:
  43. """
  44. What is the capital of Bulgaria ?
  45. """
  46. Then embeddings are generated
  47. Scenario: OAI Embeddings compatibility
  48. Given a model tinyllama-2
  49. When an OAI compatible embeddings computation request for:
  50. """
  51. What is the capital of Spain ?
  52. """
  53. Then embeddings are generated
  54. Scenario: OAI Embeddings compatibility with multiple inputs
  55. Given a model tinyllama-2
  56. Given a prompt:
  57. """
  58. In which country Paris is located ?
  59. """
  60. And a prompt:
  61. """
  62. Is Madrid the capital of Spain ?
  63. """
  64. When an OAI compatible embeddings computation request for multiple inputs
  65. Then embeddings are generated
  66. Scenario: Tokenize / Detokenize
  67. When tokenizing:
  68. """
  69. What is the capital of France ?
  70. """
  71. Then tokens can be detokenize