wrong_usages.feature 883 B

12345678910111213141516171819202122232425
  1. # run with: ./tests.sh --no-skipped --tags wrong_usage
  2. @wrong_usage
  3. Feature: Wrong usage of llama.cpp server
  4. #3969 The user must always set --n-predict option
  5. # to cap the number of tokens any completion request can generate
  6. # or pass n_predict/max_tokens in the request.
  7. Scenario: Infinite loop
  8. Given a server listening on localhost:8080
  9. And a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models
  10. And 42 as server seed
  11. And 2048 KV cache size
  12. # Uncomment below to fix the issue
  13. #And 64 server max tokens to predict
  14. Then the server is starting
  15. Then the server is healthy
  16. Given a prompt:
  17. """
  18. Go to: infinite loop
  19. """
  20. # Uncomment below to fix the issue
  21. #And 128 max tokens to predict
  22. Given concurrent completion requests
  23. Then the server is idle
  24. Then all prompts are predicted