parallel.feature 2.0 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
  1. @llama.cpp
  2. Feature: Parallel
  3. Background: Server startup
  4. Given a server listening on localhost:8080
  5. And a model file stories260K.gguf
  6. And a model alias tinyllama-2
  7. And 42 as server seed
  8. And 64 KV cache size
  9. And 2 slots
  10. And continuous batching
  11. Then the server is starting
  12. Then the server is healthy
  13. Scenario Outline: Multi users completion
  14. Given a prompt:
  15. """
  16. Write a very long story about AI.
  17. """
  18. And a prompt:
  19. """
  20. Write another very long music lyrics.
  21. """
  22. And <n_predict> max tokens to predict
  23. Given concurrent completion requests
  24. Then the server is busy
  25. Then the server is idle
  26. And all slots are idle
  27. Then all prompts are predicted with <n_predict> tokens
  28. Examples:
  29. | n_predict |
  30. | 128 |
  31. Scenario Outline: Multi users OAI completions compatibility
  32. Given a system prompt You are a writer.
  33. And a model tinyllama-2
  34. Given a prompt:
  35. """
  36. Write a very long book.
  37. """
  38. And a prompt:
  39. """
  40. Write another a poem.
  41. """
  42. And <n_predict> max tokens to predict
  43. And streaming is <streaming>
  44. Given concurrent OAI completions requests
  45. Then the server is busy
  46. Then the server is idle
  47. Then all prompts are predicted with <n_predict> tokens
  48. Examples:
  49. | streaming | n_predict |
  50. | disabled | 128 |
  51. | enabled | 64 |
  52. Scenario: Multi users with total number of tokens to predict exceeds the KV Cache size #3969
  53. Given a prompt:
  54. """
  55. Write a very long story about AI.
  56. """
  57. And a prompt:
  58. """
  59. Write another very long music lyrics.
  60. """
  61. And a prompt:
  62. """
  63. Write a very long poem.
  64. """
  65. And a prompt:
  66. """
  67. Write a very long joke.
  68. """
  69. And 128 max tokens to predict
  70. Given concurrent completion requests
  71. Then the server is busy
  72. Then the server is idle
  73. Then all prompts are predicted