|
@@ -7,9 +7,9 @@
|
|
|
## Images
|
|
## Images
|
|
|
We have three Docker images available for this project:
|
|
We have three Docker images available for this project:
|
|
|
|
|
|
|
|
-1. `ghcr.io/ggml-org/llama.cpp:full`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
|
|
|
|
|
-2. `ghcr.io/ggml-org/llama.cpp:light`: This image only includes the main executable file. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
|
|
|
|
|
-3. `ghcr.io/ggml-org/llama.cpp:server`: This image only includes the server executable file. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
|
|
|
|
|
|
|
+1. `ghcr.io/ggml-org/llama.cpp:full`: This image includes both the `llama-cli` and `llama-completion` executables and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
|
|
|
|
|
+2. `ghcr.io/ggml-org/llama.cpp:light`: This image only includes the `llama-cli` and `llama-completion` executables. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
|
|
|
|
|
+3. `ghcr.io/ggml-org/llama.cpp:server`: This image only includes the `llama-server` executable. (platforms: `linux/amd64`, `linux/arm64`, `linux/s390x`)
|
|
|
|
|
|
|
|
Additionally, there the following images, similar to the above:
|
|
Additionally, there the following images, similar to the above:
|
|
|
|
|
|
|
@@ -44,13 +44,15 @@ docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:full --all-in-o
|
|
|
On completion, you are ready to play!
|
|
On completion, you are ready to play!
|
|
|
|
|
|
|
|
```bash
|
|
```bash
|
|
|
-docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
|
|
|
|
|
|
|
+docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf
|
|
|
|
|
+docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:full --run-legacy -m /models/32B/ggml-model-q8_0.gguf -no-cnv -p "Building a mobile app can be done in 15 steps:" -n 512
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
or with a light image:
|
|
or with a light image:
|
|
|
|
|
|
|
|
```bash
|
|
```bash
|
|
|
-docker run -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
|
|
|
|
|
|
|
+docker run -v /path/to/models:/models --entrypoint /app/llama-cli ghcr.io/ggml-org/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf
|
|
|
|
|
+docker run -v /path/to/models:/models --entrypoint /app/llama-completion ghcr.io/ggml-org/llama.cpp:light -m /models/32B/ggml-model-q8_0.gguf -no-cnv -p "Building a mobile app can be done in 15 steps:" -n 512
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
or with a server image:
|
|
or with a server image:
|
|
@@ -59,6 +61,8 @@ or with a server image:
|
|
|
docker run -v /path/to/models:/models -p 8080:8080 ghcr.io/ggml-org/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8080 --host 0.0.0.0 -n 512
|
|
docker run -v /path/to/models:/models -p 8080:8080 ghcr.io/ggml-org/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8080 --host 0.0.0.0 -n 512
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
|
|
+In the above examples, `--entrypoint /app/llama-cli` is specified for clarity, but you can safely omit it since it's the default entrypoint in the container.
|
|
|
|
|
+
|
|
|
## Docker With CUDA
|
|
## Docker With CUDA
|
|
|
|
|
|
|
|
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
|
|
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
|
|
@@ -80,9 +84,9 @@ The defaults are:
|
|
|
|
|
|
|
|
The resulting images, are essentially the same as the non-CUDA images:
|
|
The resulting images, are essentially the same as the non-CUDA images:
|
|
|
|
|
|
|
|
-1. `local/llama.cpp:full-cuda`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
|
|
|
|
|
-2. `local/llama.cpp:light-cuda`: This image only includes the main executable file.
|
|
|
|
|
-3. `local/llama.cpp:server-cuda`: This image only includes the server executable file.
|
|
|
|
|
|
|
+1. `local/llama.cpp:full-cuda`: This image includes both the `llama-cli` and `llama-completion` executables and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
|
|
|
|
|
+2. `local/llama.cpp:light-cuda`: This image only includes the `llama-cli` and `llama-completion` executables.
|
|
|
|
|
+3. `local/llama.cpp:server-cuda`: This image only includes the `llama-server` executable.
|
|
|
|
|
|
|
|
## Usage
|
|
## Usage
|
|
|
|
|
|
|
@@ -114,9 +118,9 @@ The defaults are:
|
|
|
|
|
|
|
|
The resulting images, are essentially the same as the non-MUSA images:
|
|
The resulting images, are essentially the same as the non-MUSA images:
|
|
|
|
|
|
|
|
-1. `local/llama.cpp:full-musa`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
|
|
|
|
|
-2. `local/llama.cpp:light-musa`: This image only includes the main executable file.
|
|
|
|
|
-3. `local/llama.cpp:server-musa`: This image only includes the server executable file.
|
|
|
|
|
|
|
+1. `local/llama.cpp:full-musa`: This image includes both the `llama-cli` and `llama-completion` executables and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
|
|
|
|
|
+2. `local/llama.cpp:light-musa`: This image only includes the `llama-cli` and `llama-completion` executables.
|
|
|
|
|
+3. `local/llama.cpp:server-musa`: This image only includes the `llama-server` executable.
|
|
|
|
|
|
|
|
## Usage
|
|
## Usage
|
|
|
|
|
|