| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113 |
- @llama.cpp
- @embeddings
- Feature: llama.cpp server
- Background: Server startup
- Given a server listening on localhost:8080
- And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf
- And a model file bert-bge-small.gguf
- And a model alias bert-bge-small
- And 42 as server seed
- And 2 slots
- # the bert-bge-small model has context size of 512
- # since the generated prompts are as big as the batch size, we need to set the batch size to <= 512
- # ref: https://huggingface.co/BAAI/bge-small-en-v1.5/blob/5c38ec7c405ec4b44b94cc5a9bb96e735b38267a/config.json#L20
- And 128 as batch size
- And 128 as ubatch size
- And 512 KV cache size
- And enable embeddings endpoint
- Then the server is starting
- Then the server is healthy
- Scenario: Embedding
- When embeddings are computed for:
- """
- What is the capital of Bulgaria ?
- """
- Then embeddings are generated
- Scenario: Embedding (error: prompt too long)
- When embeddings are computed for:
- """
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
- Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
- Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
- Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
- Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
- Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
- Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
- """
- And embeddings request with 500 api error
- Scenario: OAI Embeddings compatibility
- Given a model bert-bge-small
- When an OAI compatible embeddings computation request for:
- """
- What is the capital of Spain ?
- """
- Then embeddings are generated
- Scenario: OAI Embeddings compatibility with multiple inputs
- Given a model bert-bge-small
- Given a prompt:
- """
- In which country Paris is located ?
- """
- And a prompt:
- """
- Is Madrid the capital of Spain ?
- """
- When an OAI compatible embeddings computation request for multiple inputs
- Then embeddings are generated
- Scenario: Multi users embeddings
- Given a prompt:
- """
- Write a very long story about AI.
- """
- And a prompt:
- """
- Write another very long music lyrics.
- """
- And a prompt:
- """
- Write a very long poem.
- """
- And a prompt:
- """
- Write a very long joke.
- """
- Given concurrent embedding requests
- Then the server is busy
- Then the server is idle
- Then all embeddings are generated
- Scenario: Multi users OAI compatibility embeddings
- Given a prompt:
- """
- In which country Paris is located ?
- """
- And a prompt:
- """
- Is Madrid the capital of Spain ?
- """
- And a prompt:
- """
- What is the biggest US city ?
- """
- And a prompt:
- """
- What is the capital of Bulgaria ?
- """
- And a model bert-bge-small
- Given concurrent OAI embedding requests
- Then the server is busy
- Then the server is idle
- Then all embeddings are generated
- Scenario: All embeddings should be the same
- Given 10 fixed prompts
- And a model bert-bge-small
- Given concurrent OAI embedding requests
- Then all embeddings are the same
|