passkey.feature 2.8 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
  1. # run with: ./tests.sh --no-skipped --tags passkey
  2. @passkey
  3. @slow
  4. Feature: Passkey / Self-extend with context shift
  5. Background: Server startup
  6. Given a server listening on localhost:8080
  7. # Generates a long text of junk and inserts a secret passkey number inside it.
  8. # Then we query the LLM for the secret passkey.
  9. # see #3856 and #4810
  10. Scenario Outline: Passkey
  11. Given a model file <hf_file> from HF repo <hf_repo>
  12. And <n_batch> as batch size
  13. And <n_junk> as number of junk
  14. And <n_predicted> server max tokens to predict
  15. And 42 as seed
  16. And 0.0 temperature
  17. And <n_ctx> KV cache size
  18. And 1 slots
  19. And <n_ga> group attention factor to extend context size through self-extend
  20. And <n_ga_w> group attention width to extend context size through self-extend
  21. # Can be override with N_GPU_LAYERS
  22. And <ngl> GPU offloaded layers
  23. Then the server is starting
  24. # Higher timeout because the model may need to be downloaded from the internet
  25. Then the server is healthy with timeout 120 seconds
  26. Given available models
  27. Then model 0 is trained on <n_ctx_train> tokens context
  28. Given a prefix prompt:
  29. """
  30. here is an important info hidden inside a lot of irrelevant text. Find it and memorize them. I will quiz you about the important information there.
  31. """
  32. And a passkey prompt template:
  33. """
  34. The pass key is <passkey> Remember it. <passkey> is the pass key.
  35. """
  36. And a junk suffix prompt:
  37. """
  38. The grass is green. The sky is blue. The sun is yellow. Here we go. There and back again.
  39. """
  40. And a suffix prompt:
  41. """
  42. What is the pass key? The pass key is
  43. """
  44. Given a "<passkey>" passkey challenge prompt with the passkey inserted every <i_pos> junk
  45. And a completion request with no api error
  46. Then <n_predicted> tokens are predicted matching <re_content>
  47. Examples:
  48. | hf_repo | hf_file | n_ctx_train | ngl | n_ctx | n_batch | n_ga | n_ga_w | n_junk | i_pos | passkey | n_predicted | re_content |
  49. | TheBloke/phi-2-GGUF | phi-2.Q4_K_M.gguf | 2048 | 5 | 8192 | 512 | 4 | 512 | 250 | 50 | 42 | 1 | 42 |
  50. | TheBloke/phi-2-GGUF | phi-2.Q4_K_M.gguf | 2048 | 5 | 8192 | 512 | 2 | 512 | 250 | 50 | 42 | 1 | \b((?!42)\w)+\b |
  51. #| TheBloke/Llama-2-7B-GGUF | llama-2-7b.Q2_K.gguf | 4096 | 3 | 16384 | 512 | 4 | 512 | 500 | 300 | 1234 | 5 | 1234 |
  52. #| TheBloke/Mixtral-8x7B-v0.1-GGUF | mixtral-8x7b-v0.1.Q2_K.gguf | 32768 | 2 | 16384 | 512 | 4 | 512 | 500 | 100 | 0987 | 5 | 0
  53. # 987 |