|
@@ -1,4 +1,4 @@
|
|
|
-# llama.cpp/tools/main
|
|
|
|
|
|
|
+# llama.cpp/tools/completion
|
|
|
|
|
|
|
|
This example program allows you to use various LLaMA language models easily and efficiently. It is specifically designed to work with the [llama.cpp](https://github.com/ggml-org/llama.cpp) project, which provides a plain C/C++ implementation with optional 4-bit quantization support for faster, lower memory inference, and is optimized for desktop CPUs. This program can be used to perform various inference tasks with LLaMA models, including generating text based on user-provided prompts and chat-like interactions with reverse prompts.
|
|
This example program allows you to use various LLaMA language models easily and efficiently. It is specifically designed to work with the [llama.cpp](https://github.com/ggml-org/llama.cpp) project, which provides a plain C/C++ implementation with optional 4-bit quantization support for faster, lower memory inference, and is optimized for desktop CPUs. This program can be used to perform various inference tasks with LLaMA models, including generating text based on user-provided prompts and chat-like interactions with reverse prompts.
|
|
|
|
|
|
|
@@ -27,64 +27,64 @@ Once downloaded, place your model in the models folder in llama.cpp.
|
|
|
##### Input prompt (One-and-done)
|
|
##### Input prompt (One-and-done)
|
|
|
|
|
|
|
|
```bash
|
|
```bash
|
|
|
-./llama-cli -m models/gemma-1.1-7b-it.Q4_K_M.gguf -no-cnv --prompt "Once upon a time"
|
|
|
|
|
|
|
+./llama-completion -m models/gemma-1.1-7b-it.Q4_K_M.gguf -no-cnv --prompt "Once upon a time"
|
|
|
```
|
|
```
|
|
|
##### Conversation mode (Allow for continuous interaction with the model)
|
|
##### Conversation mode (Allow for continuous interaction with the model)
|
|
|
|
|
|
|
|
```bash
|
|
```bash
|
|
|
-./llama-cli -m models/gemma-1.1-7b-it.Q4_K_M.gguf --chat-template gemma
|
|
|
|
|
|
|
+./llama-completion -m models/gemma-1.1-7b-it.Q4_K_M.gguf --chat-template gemma
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
##### Conversation mode using built-in jinja chat template
|
|
##### Conversation mode using built-in jinja chat template
|
|
|
|
|
|
|
|
```bash
|
|
```bash
|
|
|
-./llama-cli -m models/gemma-1.1-7b-it.Q4_K_M.gguf --jinja
|
|
|
|
|
|
|
+./llama-completion -m models/gemma-1.1-7b-it.Q4_K_M.gguf --jinja
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
##### One-and-done query using jinja with custom system prompt and a starting prompt
|
|
##### One-and-done query using jinja with custom system prompt and a starting prompt
|
|
|
|
|
|
|
|
```bash
|
|
```bash
|
|
|
-./llama-cli -m models/gemma-1.1-7b-it.Q4_K_M.gguf --jinja --single-turn -sys "You are a helpful assistant" -p "Hello"
|
|
|
|
|
|
|
+./llama-completion -m models/gemma-1.1-7b-it.Q4_K_M.gguf --jinja --single-turn -sys "You are a helpful assistant" -p "Hello"
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
##### Infinite text from a starting prompt (you can use `Ctrl-C` to stop it):
|
|
##### Infinite text from a starting prompt (you can use `Ctrl-C` to stop it):
|
|
|
```bash
|
|
```bash
|
|
|
-./llama-cli -m models/gemma-1.1-7b-it.Q4_K_M.gguf --ignore-eos -n -1
|
|
|
|
|
|
|
+./llama-completion -m models/gemma-1.1-7b-it.Q4_K_M.gguf --ignore-eos -n -1
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
### Windows:
|
|
### Windows:
|
|
|
|
|
|
|
|
##### Input prompt (One-and-done)
|
|
##### Input prompt (One-and-done)
|
|
|
```powershell
|
|
```powershell
|
|
|
-./llama-cli.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf -no-cnv --prompt "Once upon a time"
|
|
|
|
|
|
|
+./llama-completion.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf -no-cnv --prompt "Once upon a time"
|
|
|
```
|
|
```
|
|
|
##### Conversation mode (Allow for continuous interaction with the model)
|
|
##### Conversation mode (Allow for continuous interaction with the model)
|
|
|
|
|
|
|
|
```powershell
|
|
```powershell
|
|
|
-./llama-cli.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --chat-template gemma
|
|
|
|
|
|
|
+./llama-completion.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --chat-template gemma
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
##### Conversation mode using built-in jinja chat template
|
|
##### Conversation mode using built-in jinja chat template
|
|
|
|
|
|
|
|
```powershell
|
|
```powershell
|
|
|
-./llama-cli.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --jinja
|
|
|
|
|
|
|
+./llama-completion.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --jinja
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
##### One-and-done query using jinja with custom system prompt and a starting prompt
|
|
##### One-and-done query using jinja with custom system prompt and a starting prompt
|
|
|
|
|
|
|
|
```powershell
|
|
```powershell
|
|
|
-./llama-cli.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --jinja --single-turn -sys "You are a helpful assistant" -p "Hello"
|
|
|
|
|
|
|
+./llama-completion.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --jinja --single-turn -sys "You are a helpful assistant" -p "Hello"
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
#### Infinite text from a starting prompt (you can use `Ctrl-C` to stop it):
|
|
#### Infinite text from a starting prompt (you can use `Ctrl-C` to stop it):
|
|
|
|
|
|
|
|
```powershell
|
|
```powershell
|
|
|
-llama-cli.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --ignore-eos -n -1
|
|
|
|
|
|
|
+llama-completion.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --ignore-eos -n -1
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
## Common Options
|
|
## Common Options
|
|
|
|
|
|
|
|
-In this section, we cover the most commonly used options for running the `llama-cli` program with the LLaMA models:
|
|
|
|
|
|
|
+In this section, we cover the most commonly used options for running the `llama-completion` program with the LLaMA models:
|
|
|
|
|
|
|
|
- `-m FNAME, --model FNAME`: Specify the path to the LLaMA model file (e.g., `models/gemma-1.1-7b-it.Q4_K_M.gguf`; inferred from `--model-url` if set).
|
|
- `-m FNAME, --model FNAME`: Specify the path to the LLaMA model file (e.g., `models/gemma-1.1-7b-it.Q4_K_M.gguf`; inferred from `--model-url` if set).
|
|
|
- `-mu MODEL_URL --model-url MODEL_URL`: Specify a remote http url to download the file (e.g [https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf?download=true](https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf?download=true)).
|
|
- `-mu MODEL_URL --model-url MODEL_URL`: Specify a remote http url to download the file (e.g [https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf?download=true](https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf?download=true)).
|
|
@@ -97,7 +97,7 @@ In this section, we cover the most commonly used options for running the `llama-
|
|
|
|
|
|
|
|
## Input Prompts
|
|
## Input Prompts
|
|
|
|
|
|
|
|
-The `llama-cli` program provides several ways to interact with the LLaMA models using input prompts:
|
|
|
|
|
|
|
+The `llama-completion` program provides several ways to interact with the LLaMA models using input prompts:
|
|
|
|
|
|
|
|
- `--prompt PROMPT`: Provide a prompt directly as a command-line option.
|
|
- `--prompt PROMPT`: Provide a prompt directly as a command-line option.
|
|
|
- `--file FNAME`: Provide a file containing a prompt or multiple prompts.
|
|
- `--file FNAME`: Provide a file containing a prompt or multiple prompts.
|
|
@@ -107,7 +107,7 @@ The `llama-cli` program provides several ways to interact with the LLaMA models
|
|
|
|
|
|
|
|
## Interaction
|
|
## Interaction
|
|
|
|
|
|
|
|
-The `llama-cli` program offers a seamless way to interact with LLaMA models, allowing users to engage in real-time conversations or provide instructions for specific tasks. The interactive mode can be triggered using various options, including `--interactive` and `--interactive-first`.
|
|
|
|
|
|
|
+The `llama-completion` program offers a seamless way to interact with LLaMA models, allowing users to engage in real-time conversations or provide instructions for specific tasks. The interactive mode can be triggered using various options, including `--interactive` and `--interactive-first`.
|
|
|
|
|
|
|
|
In interactive mode, users can participate in text generation by injecting their input during the process. Users can press `Ctrl+C` at any time to interject and type their input, followed by pressing `Return` to submit it to the LLaMA model. To submit additional lines without finalizing input, users can end the current line with a backslash (`\`) and continue typing.
|
|
In interactive mode, users can participate in text generation by injecting their input during the process. Users can press `Ctrl+C` at any time to interject and type their input, followed by pressing `Return` to submit it to the LLaMA model. To submit additional lines without finalizing input, users can end the current line with a backslash (`\`) and continue typing.
|
|
|
|
|
|
|
@@ -136,7 +136,7 @@ To overcome this limitation, you can use the `--in-prefix` flag to add a space o
|
|
|
The `--in-prefix` flag is used to add a prefix to your input, primarily, this is used to insert a space after the reverse prompt. Here's an example of how to use the `--in-prefix` flag in conjunction with the `--reverse-prompt` flag:
|
|
The `--in-prefix` flag is used to add a prefix to your input, primarily, this is used to insert a space after the reverse prompt. Here's an example of how to use the `--in-prefix` flag in conjunction with the `--reverse-prompt` flag:
|
|
|
|
|
|
|
|
```sh
|
|
```sh
|
|
|
-./llama-cli -r "User:" --in-prefix " "
|
|
|
|
|
|
|
+./llama-completion -r "User:" --in-prefix " "
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
### In-Suffix
|
|
### In-Suffix
|
|
@@ -144,7 +144,7 @@ The `--in-prefix` flag is used to add a prefix to your input, primarily, this is
|
|
|
The `--in-suffix` flag is used to add a suffix after your input. This is useful for adding an "Assistant:" prompt after the user's input. It's added after the new-line character (`\n`) that's automatically added to the end of the user's input. Here's an example of how to use the `--in-suffix` flag in conjunction with the `--reverse-prompt` flag:
|
|
The `--in-suffix` flag is used to add a suffix after your input. This is useful for adding an "Assistant:" prompt after the user's input. It's added after the new-line character (`\n`) that's automatically added to the end of the user's input. Here's an example of how to use the `--in-suffix` flag in conjunction with the `--reverse-prompt` flag:
|
|
|
|
|
|
|
|
```sh
|
|
```sh
|
|
|
-./llama-cli -r "User:" --in-prefix " " --in-suffix "Assistant:"
|
|
|
|
|
|
|
+./llama-completion -r "User:" --in-prefix " " --in-suffix "Assistant:"
|
|
|
```
|
|
```
|
|
|
When --in-prefix or --in-suffix options are enabled the chat template ( --chat-template ) is disabled
|
|
When --in-prefix or --in-suffix options are enabled the chat template ( --chat-template ) is disabled
|
|
|
|
|
|