|
|
@@ -460,7 +460,7 @@ These words will not be included in the completion, so make sure to add them to
|
|
|
- Note: In streaming mode (`stream`), only `content`, `tokens` and `stop` will be returned until end of completion. Responses are sent using the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html) standard. Note: the browser's `EventSource` interface cannot be used due to its lack of `POST` request support.
|
|
|
|
|
|
- `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has a nested array `top_logprobs`. It contains at **maximum** `n_probs` elements:
|
|
|
- ```json
|
|
|
+ ```
|
|
|
{
|
|
|
"content": "<the generated completion text>",
|
|
|
"tokens": [ generated token ids if requested ],
|
|
|
@@ -561,7 +561,7 @@ If `with_pieces` is `true`:
|
|
|
```
|
|
|
|
|
|
With input 'á' (utf8 hex: C3 A1) on tinyllama/stories260k
|
|
|
-```json
|
|
|
+```
|
|
|
{
|
|
|
"tokens": [
|
|
|
{"id": 198, "piece": [195]}, // hex C3
|
|
|
@@ -776,7 +776,7 @@ Same as the `/v1/embeddings` endpoint.
|
|
|
|
|
|
**Response format**
|
|
|
|
|
|
-```json
|
|
|
+```
|
|
|
[
|
|
|
{
|
|
|
"index": 0,
|