lora.feature 1.1 KB

123456789101112131415161718192021222324252627282930313233343536
  1. @llama.cpp
  2. @lora
  3. Feature: llama.cpp server
  4. Background: Server startup
  5. Given a server listening on localhost:8080
  6. And a model url https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/stories15M_MOE-F16.gguf
  7. And a model file stories15M_MOE-F16.gguf
  8. And a model alias stories15M_MOE
  9. And a lora adapter file from https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/moe_shakespeare15M.gguf
  10. And 42 as server seed
  11. And 1024 as batch size
  12. And 1024 as ubatch size
  13. And 2048 KV cache size
  14. And 64 max tokens to predict
  15. And 0.0 temperature
  16. Then the server is starting
  17. Then the server is healthy
  18. Scenario: Completion LoRA disabled
  19. Given switch off lora adapter 0
  20. Given a prompt:
  21. """
  22. Look in thy glass
  23. """
  24. And a completion request with no api error
  25. Then 64 tokens are predicted matching little|girl|three|years|old
  26. Scenario: Completion LoRA enabled
  27. Given switch on lora adapter 0
  28. Given a prompt:
  29. """
  30. Look in thy glass
  31. """
  32. And a completion request with no api error
  33. Then 64 tokens are predicted matching eye|love|glass|sun