| 123456789101112131415161718192021222324252627282930313233343536 |
- @llama.cpp
- @lora
- Feature: llama.cpp server
- Background: Server startup
- Given a server listening on localhost:8080
- And a model url https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/stories15M_MOE-F16.gguf
- And a model file stories15M_MOE-F16.gguf
- And a model alias stories15M_MOE
- And a lora adapter file from https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/moe_shakespeare15M.gguf
- And 42 as server seed
- And 1024 as batch size
- And 1024 as ubatch size
- And 2048 KV cache size
- And 64 max tokens to predict
- And 0.0 temperature
- Then the server is starting
- Then the server is healthy
- Scenario: Completion LoRA disabled
- Given switch off lora adapter 0
- Given a prompt:
- """
- Look in thy glass
- """
- And a completion request with no api error
- Then 64 tokens are predicted matching little|girl|three|years|old
- Scenario: Completion LoRA enabled
- Given switch on lora adapter 0
- Given a prompt:
- """
- Look in thy glass
- """
- And a completion request with no api error
- Then 64 tokens are predicted matching eye|love|glass|sun
|