results.feature 1.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
  1. @llama.cpp
  2. @results
  3. Feature: Results
  4. Background: Server startup
  5. Given a server listening on localhost:8080
  6. And a model file tinyllamas/split/stories15M-00001-of-00003.gguf from HF repo ggml-org/models
  7. And a model file test-model-00001-of-00003.gguf
  8. And 128 as batch size
  9. And 256 KV cache size
  10. And 128 max tokens to predict
  11. Scenario Outline: Multi users completion
  12. Given <n_slots> slots
  13. And continuous batching
  14. Then the server is starting
  15. Then the server is healthy
  16. Given 42 as seed
  17. And a prompt:
  18. """
  19. Write a very long story about AI.
  20. """
  21. Given 42 as seed
  22. And a prompt:
  23. """
  24. Write a very long story about AI.
  25. """
  26. Given 42 as seed
  27. And a prompt:
  28. """
  29. Write a very long story about AI.
  30. """
  31. Given 42 as seed
  32. And a prompt:
  33. """
  34. Write a very long story about AI.
  35. """
  36. Given 42 as seed
  37. And a prompt:
  38. """
  39. Write a very long story about AI.
  40. """
  41. Given concurrent completion requests
  42. Then the server is busy
  43. Then the server is idle
  44. And all slots are idle
  45. Then all predictions are equal
  46. Examples:
  47. | n_slots |
  48. | 1 |
  49. | 2 |