parallel.feature 3.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
  1. @llama.cpp
  2. Feature: Parallel
  3. Background: Server startup
  4. Given a server listening on localhost:8080
  5. And a model file stories260K.gguf
  6. And a model alias tinyllama-2
  7. And 42 as server seed
  8. And 64 KV cache size
  9. And 2 slots
  10. And embeddings extraction
  11. And continuous batching
  12. Then the server is starting
  13. Then the server is healthy
  14. Scenario Outline: Multi users completion
  15. Given a prompt:
  16. """
  17. Write a very long story about AI.
  18. """
  19. And a prompt:
  20. """
  21. Write another very long music lyrics.
  22. """
  23. And <n_predict> max tokens to predict
  24. Given concurrent completion requests
  25. Then the server is busy
  26. Then the server is idle
  27. And all slots are idle
  28. Then all prompts are predicted with <n_predict> tokens
  29. Examples:
  30. | n_predict |
  31. | 128 |
  32. Scenario Outline: Multi users OAI completions compatibility
  33. Given a system prompt You are a writer.
  34. And a model tinyllama-2
  35. Given a prompt:
  36. """
  37. Write a very long book.
  38. """
  39. And a prompt:
  40. """
  41. Write another a poem.
  42. """
  43. And <n_predict> max tokens to predict
  44. And streaming is <streaming>
  45. Given concurrent OAI completions requests
  46. Then the server is busy
  47. Then the server is idle
  48. Then all prompts are predicted with <n_predict> tokens
  49. Examples:
  50. | streaming | n_predict |
  51. | disabled | 128 |
  52. | enabled | 64 |
  53. Scenario: Multi users with total number of tokens to predict exceeds the KV Cache size #3969
  54. Given a prompt:
  55. """
  56. Write a very long story about AI.
  57. """
  58. And a prompt:
  59. """
  60. Write another very long music lyrics.
  61. """
  62. And a prompt:
  63. """
  64. Write a very long poem.
  65. """
  66. And a prompt:
  67. """
  68. Write a very long joke.
  69. """
  70. And 128 max tokens to predict
  71. Given concurrent completion requests
  72. Then the server is busy
  73. Then the server is idle
  74. Then all prompts are predicted
  75. Scenario: Multi users embeddings
  76. Given a prompt:
  77. """
  78. Write a very long story about AI.
  79. """
  80. And a prompt:
  81. """
  82. Write another very long music lyrics.
  83. """
  84. And a prompt:
  85. """
  86. Write a very long poem.
  87. """
  88. And a prompt:
  89. """
  90. Write a very long joke.
  91. """
  92. Given concurrent embedding requests
  93. Then the server is busy
  94. Then the server is idle
  95. Then all embeddings are generated
  96. Scenario: Multi users OAI compatibility embeddings
  97. Given a prompt:
  98. """
  99. In which country Paris is located ?
  100. """
  101. And a prompt:
  102. """
  103. Is Madrid the capital of Spain ?
  104. """
  105. And a prompt:
  106. """
  107. What is the biggest US city ?
  108. """
  109. And a prompt:
  110. """
  111. What is the capital of Bulgaria ?
  112. """
  113. And a model tinyllama-2
  114. Given concurrent OAI embedding requests
  115. Then the server is busy
  116. Then the server is idle
  117. Then all embeddings are generated