wrong_usages.feature 794 B

12345678910111213141516171819202122
  1. # run with: ./tests.sh --no-skipped --tags wrong_usage
  2. @wrong_usage
  3. Feature: Wrong usage of llama.cpp server
  4. #3969 The user must always set --n-predict option
  5. # to cap the number of tokens any completion request can generate
  6. # or pass n_predict/max_tokens in the request.
  7. Scenario: Infinite loop
  8. Given a server listening on localhost:8080
  9. And a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models
  10. # Uncomment below to fix the issue
  11. #And 64 server max tokens to predict
  12. Then the server is starting
  13. Given a prompt:
  14. """
  15. Go to: infinite loop
  16. """
  17. # Uncomment below to fix the issue
  18. #And 128 max tokens to predict
  19. Given concurrent completion requests
  20. Then the server is idle
  21. Then all prompts are predicted