Sfoglia il codice sorgente

llama : move end-user examples to tools directory (#13249)

* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Diego Devesa 8 mesi fa
parent
commit
1d36b3670b
100 ha cambiato i file con 211 aggiunte e 175 eliminazioni
  1. 4 4
      .editorconfig
  2. 2 1
      .flake8
  3. 4 2
      .github/labeler.yml
  4. 15 15
      .github/workflows/bench.yml.disabled
  5. 3 0
      .github/workflows/build-linux-cross.yml
  6. 5 0
      .github/workflows/build.yml
  7. 12 12
      .github/workflows/server.yml
  8. 6 6
      .gitignore
  9. 5 0
      CMakeLists.txt
  10. 1 1
      CODEOWNERS
  11. 46 46
      Makefile
  12. 10 10
      README.md
  13. 1 1
      SECURITY.md
  14. 2 0
      build-xcframework.sh
  15. 4 4
      ci/run.sh
  16. 2 2
      common/arg.cpp
  17. 3 3
      common/common.h
  18. 4 4
      docs/development/HOWTO-add-model.md
  19. 6 6
      docs/multimodal/MobileVLM.md
  20. 2 2
      docs/multimodal/glmedge.md
  21. 6 6
      docs/multimodal/llava.md
  22. 2 2
      docs/multimodal/minicpmo2.6.md
  23. 2 2
      docs/multimodal/minicpmv2.5.md
  24. 2 2
      docs/multimodal/minicpmv2.6.md
  25. 1 22
      examples/CMakeLists.txt
  26. 1 1
      examples/pydantic_models_to_grammar_examples.py
  27. BIN
      examples/server/public/index.html.gz
  28. 6 6
      grammars/README.md
  29. 1 1
      pyrightconfig.json
  30. 3 3
      requirements/requirements-all.txt
  31. 2 2
      scripts/fetch_server_test_models.py
  32. 3 3
      scripts/tool_bench.py
  33. 1 1
      scripts/xxd.cmake
  34. 1 1
      tests/CMakeLists.txt
  35. 1 1
      tests/run-json-schema-to-grammar.mjs
  36. 39 0
      tools/CMakeLists.txt
  37. 0 0
      tools/batched-bench/CMakeLists.txt
  38. 0 0
      tools/batched-bench/README.md
  39. 0 0
      tools/batched-bench/batched-bench.cpp
  40. 0 0
      tools/cvector-generator/CMakeLists.txt
  41. 0 0
      tools/cvector-generator/README.md
  42. 0 0
      tools/cvector-generator/completions.txt
  43. 0 0
      tools/cvector-generator/cvector-generator.cpp
  44. 0 0
      tools/cvector-generator/mean.hpp
  45. 0 0
      tools/cvector-generator/negative.txt
  46. 0 0
      tools/cvector-generator/pca.hpp
  47. 0 0
      tools/cvector-generator/positive.txt
  48. 0 0
      tools/export-lora/CMakeLists.txt
  49. 0 0
      tools/export-lora/README.md
  50. 0 0
      tools/export-lora/export-lora.cpp
  51. 0 0
      tools/gguf-split/CMakeLists.txt
  52. 0 0
      tools/gguf-split/README.md
  53. 0 0
      tools/gguf-split/gguf-split.cpp
  54. 0 0
      tools/gguf-split/tests.sh
  55. 0 0
      tools/imatrix/CMakeLists.txt
  56. 1 1
      tools/imatrix/README.md
  57. 0 0
      tools/imatrix/imatrix.cpp
  58. 0 0
      tools/llama-bench/CMakeLists.txt
  59. 1 1
      tools/llama-bench/README.md
  60. 0 0
      tools/llama-bench/llama-bench.cpp
  61. 0 0
      tools/llava/CMakeLists.txt
  62. 0 0
      tools/llava/README-quantize.md
  63. 0 0
      tools/llava/README.md
  64. 0 0
      tools/llava/android/adb_run.sh
  65. 0 0
      tools/llava/android/build_64.sh
  66. 0 0
      tools/llava/clip-impl.h
  67. 0 0
      tools/llava/clip-quantize-cli.cpp
  68. 0 0
      tools/llava/clip.cpp
  69. 0 0
      tools/llava/clip.h
  70. 0 0
      tools/llava/convert_image_encoder_to_gguf.py
  71. 0 0
      tools/llava/deprecation-warning.cpp
  72. 0 0
      tools/llava/glmedge-convert-image-encoder-to-gguf.py
  73. 0 0
      tools/llava/glmedge-surgery.py
  74. 0 0
      tools/llava/llava.cpp
  75. 0 0
      tools/llava/llava.h
  76. 0 0
      tools/llava/llava_surgery.py
  77. 0 0
      tools/llava/llava_surgery_v2.py
  78. 0 0
      tools/llava/minicpmv-convert-image-encoder-to-gguf.py
  79. 0 0
      tools/llava/minicpmv-surgery.py
  80. 0 0
      tools/llava/mtmd-cli.cpp
  81. 0 0
      tools/llava/mtmd.cpp
  82. 0 0
      tools/llava/mtmd.h
  83. 0 0
      tools/llava/qwen2vl-test.cpp
  84. 0 0
      tools/llava/requirements.txt
  85. 0 0
      tools/llava/test-1.jpeg
  86. 0 0
      tools/llava/tests.sh
  87. 0 0
      tools/main/CMakeLists.txt
  88. 1 1
      tools/main/README.md
  89. 0 0
      tools/main/main.cpp
  90. 0 0
      tools/perplexity/CMakeLists.txt
  91. 0 0
      tools/perplexity/README.md
  92. 0 0
      tools/perplexity/perplexity.cpp
  93. 0 0
      tools/quantize/CMakeLists.txt
  94. 0 0
      tools/quantize/README.md
  95. 0 0
      tools/quantize/quantize.cpp
  96. 0 0
      tools/quantize/tests.sh
  97. 0 0
      tools/rpc/CMakeLists.txt
  98. 0 0
      tools/rpc/README.md
  99. 0 0
      tools/rpc/rpc-server.cpp
  100. 0 0
      tools/run/CMakeLists.txt

+ 4 - 4
.editorconfig

@@ -21,15 +21,15 @@ indent_style = tab
 [prompts/*.txt]
 [prompts/*.txt]
 insert_final_newline = unset
 insert_final_newline = unset
 
 
-[examples/server/public/*]
+[tools/server/public/*]
 indent_size = 2
 indent_size = 2
 
 
-[examples/server/public/deps_*]
+[tools/server/public/deps_*]
 trim_trailing_whitespace = unset
 trim_trailing_whitespace = unset
 indent_style = unset
 indent_style = unset
 indent_size = unset
 indent_size = unset
 
 
-[examples/server/deps_*]
+[tools/server/deps_*]
 trim_trailing_whitespace = unset
 trim_trailing_whitespace = unset
 indent_style = unset
 indent_style = unset
 indent_size = unset
 indent_size = unset
@@ -37,7 +37,7 @@ indent_size = unset
 [examples/llama.swiftui/llama.swiftui.xcodeproj/*]
 [examples/llama.swiftui/llama.swiftui.xcodeproj/*]
 indent_style = tab
 indent_style = tab
 
 
-[examples/cvector-generator/*.txt]
+[tools/cvector-generator/*.txt]
 trim_trailing_whitespace = unset
 trim_trailing_whitespace = unset
 insert_final_newline = unset
 insert_final_newline = unset
 
 

+ 2 - 1
.flake8

@@ -2,8 +2,9 @@
 max-line-length = 125
 max-line-length = 125
 ignore = E203,E211,E221,E225,E231,E241,E251,E261,E266,E501,E701,E704,W503
 ignore = E203,E211,E221,E225,E231,E241,E251,E261,E266,E501,E701,E704,W503
 exclude =
 exclude =
-    # Do not traverse examples
+    # Do not traverse examples and tools
     examples,
     examples,
+    tools,
     # Do not include package initializers
     # Do not include package initializers
     __init__.py,
     __init__.py,
     # No need to traverse our git directory
     # No need to traverse our git directory

+ 4 - 2
.github/labeler.yml

@@ -45,7 +45,9 @@ build:
             - CMakePresets.json
             - CMakePresets.json
 examples:
 examples:
     - changed-files:
     - changed-files:
-        - any-glob-to-any-file: examples/**
+        - any-glob-to-any-file:
+            - examples/**
+            - tools/**
 devops:
 devops:
     - changed-files:
     - changed-files:
         - any-glob-to-any-file:
         - any-glob-to-any-file:
@@ -70,7 +72,7 @@ android:
 server:
 server:
     - changed-files:
     - changed-files:
         - any-glob-to-any-file:
         - any-glob-to-any-file:
-            - examples/server/**
+            - tools/server/**
 ggml:
 ggml:
     - changed-files:
     - changed-files:
         - any-glob-to-any-file:
         - any-glob-to-any-file:

+ 15 - 15
.github/workflows/bench.yml.disabled

@@ -27,10 +27,10 @@ on:
   push:
   push:
     branches:
     branches:
       - master
       - master
-    paths: ['llama.cpp', 'ggml.c', 'ggml-backend.cpp', 'ggml-quants.c', '**/*.cu', 'examples/server/*.h*', 'examples/server/*.cpp']
+    paths: ['llama.cpp', 'ggml.c', 'ggml-backend.cpp', 'ggml-quants.c', '**/*.cu', 'tools/server/*.h*', 'tools/server/*.cpp']
   pull_request_target:
   pull_request_target:
     types: [opened, synchronize, reopened]
     types: [opened, synchronize, reopened]
-    paths: ['llama.cpp', 'ggml.c', 'ggml-backend.cpp', 'ggml-quants.c', '**/*.cu', 'examples/server/*.h*', 'examples/server/*.cpp']
+    paths: ['llama.cpp', 'ggml.c', 'ggml-backend.cpp', 'ggml-quants.c', '**/*.cu', 'tools/server/*.h*', 'tools/server/*.cpp']
   schedule:
   schedule:
     -  cron: '04 2 * * *'
     -  cron: '04 2 * * *'
 
 
@@ -69,7 +69,7 @@ jobs:
       - name: Install python env
       - name: Install python env
         id: pipenv
         id: pipenv
         run: |
         run: |
-          cd examples/server/bench
+          cd tools/server/bench
           python3 -m venv venv
           python3 -m venv venv
           source venv/bin/activate
           source venv/bin/activate
           pip install -r requirements.txt
           pip install -r requirements.txt
@@ -79,7 +79,7 @@ jobs:
         run: |
         run: |
           wget --quiet https://github.com/prometheus/prometheus/releases/download/v2.51.0/prometheus-2.51.0.linux-amd64.tar.gz
           wget --quiet https://github.com/prometheus/prometheus/releases/download/v2.51.0/prometheus-2.51.0.linux-amd64.tar.gz
           tar xzf prometheus*.tar.gz --strip-components=1
           tar xzf prometheus*.tar.gz --strip-components=1
-          ./prometheus --config.file=examples/server/bench/prometheus.yml &
+          ./prometheus --config.file=tools/server/bench/prometheus.yml &
           while ! nc -z localhost 9090; do
           while ! nc -z localhost 9090; do
             sleep 0.1
             sleep 0.1
           done
           done
@@ -92,7 +92,7 @@ jobs:
       - name: Install k6 and xk6-sse
       - name: Install k6 and xk6-sse
         id: k6_installation
         id: k6_installation
         run: |
         run: |
-          cd examples/server/bench
+          cd tools/server/bench
           go install go.k6.io/xk6/cmd/xk6@latest
           go install go.k6.io/xk6/cmd/xk6@latest
           xk6 build master \
           xk6 build master \
               --with github.com/phymbert/xk6-sse
               --with github.com/phymbert/xk6-sse
@@ -116,7 +116,7 @@ jobs:
       - name: Download the dataset
       - name: Download the dataset
         id: download_dataset
         id: download_dataset
         run: |
         run: |
-          cd examples/server/bench
+          cd tools/server/bench
           wget --quiet https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
           wget --quiet https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
 
 
       - name: Server bench
       - name: Server bench
@@ -126,7 +126,7 @@ jobs:
         run: |
         run: |
           set -eux
           set -eux
 
 
-          cd examples/server/bench
+          cd tools/server/bench
           source venv/bin/activate
           source venv/bin/activate
           python bench.py \
           python bench.py \
               --runner-label ${{ env.RUNNER_LABEL }} \
               --runner-label ${{ env.RUNNER_LABEL }} \
@@ -157,9 +157,9 @@ jobs:
           name: bench-server-${{ github.job }}-${{ env.RUNNER_LABEL }}-${{ matrix.model }}-${{ matrix.ftype }}
           name: bench-server-${{ github.job }}-${{ env.RUNNER_LABEL }}-${{ matrix.model }}-${{ matrix.ftype }}
           compression-level: 9
           compression-level: 9
           path: |
           path: |
-            examples/server/bench/*.jpg
-            examples/server/bench/*.json
-            examples/server/bench/*.log
+            tools/server/bench/*.jpg
+            tools/server/bench/*.json
+            tools/server/bench/*.log
 
 
       - name: Commit status
       - name: Commit status
         uses: Sibz/github-status-action@v1
         uses: Sibz/github-status-action@v1
@@ -178,17 +178,17 @@ jobs:
         with:
         with:
           client_id: ${{secrets.IMGUR_CLIENT_ID}}
           client_id: ${{secrets.IMGUR_CLIENT_ID}}
           path: |
           path: |
-            examples/server/bench/prompt_tokens_seconds.jpg
-            examples/server/bench/predicted_tokens_seconds.jpg
-            examples/server/bench/kv_cache_usage_ratio.jpg
-            examples/server/bench/requests_processing.jpg
+            tools/server/bench/prompt_tokens_seconds.jpg
+            tools/server/bench/predicted_tokens_seconds.jpg
+            tools/server/bench/kv_cache_usage_ratio.jpg
+            tools/server/bench/requests_processing.jpg
 
 
       - name: Extract mermaid
       - name: Extract mermaid
         id: set_mermaid
         id: set_mermaid
         run: |
         run: |
           set -eux
           set -eux
 
 
-          cd examples/server/bench
+          cd tools/server/bench
           PROMPT_TOKENS_SECONDS=$(cat prompt_tokens_seconds.mermaid)
           PROMPT_TOKENS_SECONDS=$(cat prompt_tokens_seconds.mermaid)
           echo "PROMPT_TOKENS_SECONDS<<EOF" >> $GITHUB_ENV
           echo "PROMPT_TOKENS_SECONDS<<EOF" >> $GITHUB_ENV
           echo "$PROMPT_TOKENS_SECONDS" >> $GITHUB_ENV
           echo "$PROMPT_TOKENS_SECONDS" >> $GITHUB_ENV

+ 3 - 0
.github/workflows/build-linux-cross.yml

@@ -34,6 +34,7 @@ jobs:
           cmake -B build -DCMAKE_BUILD_TYPE=Release \
           cmake -B build -DCMAKE_BUILD_TYPE=Release \
                          -DGGML_OPENMP=OFF \
                          -DGGML_OPENMP=OFF \
                          -DLLAMA_BUILD_EXAMPLES=ON \
                          -DLLAMA_BUILD_EXAMPLES=ON \
+                         -DLLAMA_BUILD_TOOLS=ON \
                          -DLLAMA_BUILD_TESTS=OFF \
                          -DLLAMA_BUILD_TESTS=OFF \
                          -DCMAKE_SYSTEM_NAME=Linux \
                          -DCMAKE_SYSTEM_NAME=Linux \
                          -DCMAKE_SYSTEM_PROCESSOR=riscv64 \
                          -DCMAKE_SYSTEM_PROCESSOR=riscv64 \
@@ -80,6 +81,7 @@ jobs:
                          -DGGML_VULKAN=ON \
                          -DGGML_VULKAN=ON \
                          -DGGML_OPENMP=OFF \
                          -DGGML_OPENMP=OFF \
                          -DLLAMA_BUILD_EXAMPLES=ON \
                          -DLLAMA_BUILD_EXAMPLES=ON \
+                         -DLLAMA_BUILD_TOOLS=ON \
                          -DLLAMA_BUILD_TESTS=OFF \
                          -DLLAMA_BUILD_TESTS=OFF \
                          -DCMAKE_SYSTEM_NAME=Linux \
                          -DCMAKE_SYSTEM_NAME=Linux \
                          -DCMAKE_SYSTEM_PROCESSOR=riscv64 \
                          -DCMAKE_SYSTEM_PROCESSOR=riscv64 \
@@ -125,6 +127,7 @@ jobs:
                          -DGGML_VULKAN=ON \
                          -DGGML_VULKAN=ON \
                          -DGGML_OPENMP=OFF \
                          -DGGML_OPENMP=OFF \
                          -DLLAMA_BUILD_EXAMPLES=ON \
                          -DLLAMA_BUILD_EXAMPLES=ON \
+                         -DLLAMA_BUILD_TOOLS=ON \
                          -DLLAMA_BUILD_TESTS=OFF \
                          -DLLAMA_BUILD_TESTS=OFF \
                          -DCMAKE_SYSTEM_NAME=Linux \
                          -DCMAKE_SYSTEM_NAME=Linux \
                          -DCMAKE_SYSTEM_PROCESSOR=aarch64 \
                          -DCMAKE_SYSTEM_PROCESSOR=aarch64 \

+ 5 - 0
.github/workflows/build.yml

@@ -633,6 +633,7 @@ jobs:
             -DGGML_METAL_EMBED_LIBRARY=ON \
             -DGGML_METAL_EMBED_LIBRARY=ON \
             -DLLAMA_BUILD_COMMON=OFF \
             -DLLAMA_BUILD_COMMON=OFF \
             -DLLAMA_BUILD_EXAMPLES=OFF \
             -DLLAMA_BUILD_EXAMPLES=OFF \
+            -DLLAMA_BUILD_TOOLS=OFF \
             -DLLAMA_BUILD_TESTS=OFF \
             -DLLAMA_BUILD_TESTS=OFF \
             -DLLAMA_BUILD_SERVER=OFF \
             -DLLAMA_BUILD_SERVER=OFF \
             -DCMAKE_SYSTEM_NAME=iOS \
             -DCMAKE_SYSTEM_NAME=iOS \
@@ -669,6 +670,7 @@ jobs:
             -DGGML_METAL_EMBED_LIBRARY=ON \
             -DGGML_METAL_EMBED_LIBRARY=ON \
             -DLLAMA_BUILD_COMMON=OFF \
             -DLLAMA_BUILD_COMMON=OFF \
             -DLLAMA_BUILD_EXAMPLES=OFF \
             -DLLAMA_BUILD_EXAMPLES=OFF \
+            -DLLAMA_BUILD_TOOLS=OFF \
             -DLLAMA_BUILD_TESTS=OFF \
             -DLLAMA_BUILD_TESTS=OFF \
             -DLLAMA_BUILD_SERVER=OFF \
             -DLLAMA_BUILD_SERVER=OFF \
             -DCMAKE_SYSTEM_NAME=tvOS \
             -DCMAKE_SYSTEM_NAME=tvOS \
@@ -699,6 +701,7 @@ jobs:
             -DGGML_METAL_EMBED_LIBRARY=ON \
             -DGGML_METAL_EMBED_LIBRARY=ON \
             -DLLAMA_BUILD_COMMON=OFF \
             -DLLAMA_BUILD_COMMON=OFF \
             -DLLAMA_BUILD_EXAMPLES=OFF \
             -DLLAMA_BUILD_EXAMPLES=OFF \
+            -DLLAMA_BUILD_TOOLS=OFF \
             -DLLAMA_BUILD_TESTS=OFF \
             -DLLAMA_BUILD_TESTS=OFF \
             -DLLAMA_BUILD_SERVER=OFF \
             -DLLAMA_BUILD_SERVER=OFF \
             -DCMAKE_SYSTEM_NAME=visionOS \
             -DCMAKE_SYSTEM_NAME=visionOS \
@@ -739,6 +742,7 @@ jobs:
             -DGGML_METAL_EMBED_LIBRARY=ON \
             -DGGML_METAL_EMBED_LIBRARY=ON \
             -DLLAMA_CURL=OFF \
             -DLLAMA_CURL=OFF \
             -DLLAMA_BUILD_EXAMPLES=OFF \
             -DLLAMA_BUILD_EXAMPLES=OFF \
+            -DLLAMA_BUILD_TOOLS=OFF \
             -DLLAMA_BUILD_TESTS=OFF \
             -DLLAMA_BUILD_TESTS=OFF \
             -DLLAMA_BUILD_SERVER=OFF \
             -DLLAMA_BUILD_SERVER=OFF \
             -DCMAKE_OSX_ARCHITECTURES="arm64;x86_64"
             -DCMAKE_OSX_ARCHITECTURES="arm64;x86_64"
@@ -1417,6 +1421,7 @@ jobs:
             -DGGML_METAL_EMBED_LIBRARY=ON \
             -DGGML_METAL_EMBED_LIBRARY=ON \
             -DLLAMA_CURL=OFF \
             -DLLAMA_CURL=OFF \
             -DLLAMA_BUILD_EXAMPLES=OFF \
             -DLLAMA_BUILD_EXAMPLES=OFF \
+            -DLLAMA_BUILD_TOOLS=OFF \
             -DLLAMA_BUILD_TESTS=OFF \
             -DLLAMA_BUILD_TESTS=OFF \
             -DLLAMA_BUILD_SERVER=OFF \
             -DLLAMA_BUILD_SERVER=OFF \
             -DCMAKE_SYSTEM_NAME=iOS \
             -DCMAKE_SYSTEM_NAME=iOS \

+ 12 - 12
.github/workflows/server.yml

@@ -15,10 +15,10 @@ on:
   push:
   push:
     branches:
     branches:
       - master
       - master
-    paths: ['.github/workflows/server.yml', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m', 'examples/server/**.*']
+    paths: ['.github/workflows/server.yml', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m', 'tools/server/**.*']
   pull_request:
   pull_request:
     types: [opened, synchronize, reopened]
     types: [opened, synchronize, reopened]
-    paths: ['.github/workflows/server.yml', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m', 'examples/server/**.*']
+    paths: ['.github/workflows/server.yml', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m', 'tools/server/**.*']
 
 
 env:
 env:
   LLAMA_LOG_COLORS: 1
   LLAMA_LOG_COLORS: 1
@@ -74,7 +74,7 @@ jobs:
       - name: Tests dependencies
       - name: Tests dependencies
         id: test_dependencies
         id: test_dependencies
         run: |
         run: |
-          pip install -r examples/server/tests/requirements.txt
+          pip install -r tools/server/tests/requirements.txt
 
 
       # Setup nodejs (to be used for verifying bundled index.html)
       # Setup nodejs (to be used for verifying bundled index.html)
       - uses: actions/setup-node@v4
       - uses: actions/setup-node@v4
@@ -84,14 +84,14 @@ jobs:
       - name: WebUI - Install dependencies
       - name: WebUI - Install dependencies
         id: webui_lint
         id: webui_lint
         run: |
         run: |
-          cd examples/server/webui
+          cd tools/server/webui
           npm ci
           npm ci
 
 
       - name: WebUI - Check code format
       - name: WebUI - Check code format
         id: webui_format
         id: webui_format
         run: |
         run: |
           git config --global --add safe.directory $(realpath .)
           git config --global --add safe.directory $(realpath .)
-          cd examples/server/webui
+          cd tools/server/webui
           git status
           git status
 
 
           npm run format
           npm run format
@@ -108,7 +108,7 @@ jobs:
         id: verify_server_index_html
         id: verify_server_index_html
         run: |
         run: |
           git config --global --add safe.directory $(realpath .)
           git config --global --add safe.directory $(realpath .)
-          cd examples/server/webui
+          cd tools/server/webui
           git status
           git status
 
 
           npm run build
           npm run build
@@ -161,21 +161,21 @@ jobs:
         env:
         env:
           GITHUB_ACTIONS: "true"
           GITHUB_ACTIONS: "true"
         run: |
         run: |
-          cd examples/server/tests
+          cd tools/server/tests
           ./tests.sh
           ./tests.sh
 
 
       - name: Tests (sanitizers)
       - name: Tests (sanitizers)
         id: server_integration_tests_sanitizers
         id: server_integration_tests_sanitizers
         if: ${{ matrix.sanitizer != '' }}
         if: ${{ matrix.sanitizer != '' }}
         run: |
         run: |
-          cd examples/server/tests
+          cd tools/server/tests
           LLAMA_SANITIZE=1 ./tests.sh
           LLAMA_SANITIZE=1 ./tests.sh
 
 
       - name: Slow tests
       - name: Slow tests
         id: server_integration_tests_slow
         id: server_integration_tests_slow
         if: ${{ (github.event.schedule || github.event.inputs.slow_tests == 'true') && matrix.build_type == 'Release' }}
         if: ${{ (github.event.schedule || github.event.inputs.slow_tests == 'true') && matrix.build_type == 'Release' }}
         run: |
         run: |
-          cd examples/server/tests
+          cd tools/server/tests
           SLOW_TESTS=1 ./tests.sh
           SLOW_TESTS=1 ./tests.sh
 
 
 
 
@@ -211,7 +211,7 @@ jobs:
       - name: Tests dependencies
       - name: Tests dependencies
         id: test_dependencies
         id: test_dependencies
         run: |
         run: |
-          pip install -r examples/server/tests/requirements.txt
+          pip install -r tools/server/tests/requirements.txt
 
 
       - name: Copy Libcurl
       - name: Copy Libcurl
         id: prepare_libcurl
         id: prepare_libcurl
@@ -224,7 +224,7 @@ jobs:
         id: server_integration_tests
         id: server_integration_tests
         if: ${{ !matrix.disabled_on_pr || !github.event.pull_request }}
         if: ${{ !matrix.disabled_on_pr || !github.event.pull_request }}
         run: |
         run: |
-          cd examples/server/tests
+          cd tools/server/tests
           $env:PYTHONIOENCODING = ":replace"
           $env:PYTHONIOENCODING = ":replace"
           pytest -v -x -m "not slow"
           pytest -v -x -m "not slow"
 
 
@@ -232,6 +232,6 @@ jobs:
         id: server_integration_tests_slow
         id: server_integration_tests_slow
         if: ${{ (github.event.schedule || github.event.inputs.slow_tests == 'true') && matrix.build_type == 'Release' }}
         if: ${{ (github.event.schedule || github.event.inputs.slow_tests == 'true') && matrix.build_type == 'Release' }}
         run: |
         run: |
-          cd examples/server/tests
+          cd tools/server/tests
           $env:SLOW_TESTS = "1"
           $env:SLOW_TESTS = "1"
           pytest -v -x
           pytest -v -x

+ 6 - 6
.gitignore

@@ -96,11 +96,11 @@ perf-*.txt
 # Examples
 # Examples
 
 
 examples/jeopardy/results.txt
 examples/jeopardy/results.txt
-examples/server/*.css.hpp
-examples/server/*.html.hpp
-examples/server/*.js.hpp
-examples/server/*.mjs.hpp
-examples/server/*.gz.hpp
+tools/server/*.css.hpp
+tools/server/*.html.hpp
+tools/server/*.js.hpp
+tools/server/*.mjs.hpp
+tools/server/*.gz.hpp
 !build_64.sh
 !build_64.sh
 !examples/*.bat
 !examples/*.bat
 !examples/*/*.kts
 !examples/*/*.kts
@@ -110,7 +110,7 @@ examples/server/*.gz.hpp
 
 
 # Server Web UI temporary files
 # Server Web UI temporary files
 node_modules
 node_modules
-examples/server/webui/dist
+tools/server/webui/dist
 
 
 # Python
 # Python
 
 

+ 5 - 0
CMakeLists.txt

@@ -77,6 +77,7 @@ option(LLAMA_BUILD_COMMON "llama: build common utils library" ${LLAMA_STANDALONE
 
 
 # extra artifacts
 # extra artifacts
 option(LLAMA_BUILD_TESTS    "llama: build tests"          ${LLAMA_STANDALONE})
 option(LLAMA_BUILD_TESTS    "llama: build tests"          ${LLAMA_STANDALONE})
+option(LLAMA_BUILD_TOOLS    "llama: build tools"          ${LLAMA_STANDALONE})
 option(LLAMA_BUILD_EXAMPLES "llama: build examples"       ${LLAMA_STANDALONE})
 option(LLAMA_BUILD_EXAMPLES "llama: build examples"       ${LLAMA_STANDALONE})
 option(LLAMA_BUILD_SERVER   "llama: build server example" ${LLAMA_STANDALONE})
 option(LLAMA_BUILD_SERVER   "llama: build server example" ${LLAMA_STANDALONE})
 
 
@@ -187,6 +188,10 @@ if (LLAMA_BUILD_COMMON AND LLAMA_BUILD_EXAMPLES)
     add_subdirectory(pocs)
     add_subdirectory(pocs)
 endif()
 endif()
 
 
+if (LLAMA_BUILD_COMMON AND LLAMA_BUILD_TOOLS)
+    add_subdirectory(tools)
+endif()
+
 #
 #
 # install
 # install
 #
 #

+ 1 - 1
CODEOWNERS

@@ -2,7 +2,7 @@
 
 
 /ci/ @ggerganov
 /ci/ @ggerganov
 /.devops/*.Dockerfile @ngxson
 /.devops/*.Dockerfile @ngxson
-/examples/server/ @ngxson
+/tools/server/ @ngxson
 /ggml/src/ggml-cuda/fattn* @JohannesGaessler
 /ggml/src/ggml-cuda/fattn* @JohannesGaessler
 /ggml/src/ggml-cuda/mmq.* @JohannesGaessler
 /ggml/src/ggml-cuda/mmq.* @JohannesGaessler
 /ggml/src/ggml-cuda/mmv.* @JohannesGaessler
 /ggml/src/ggml-cuda/mmv.* @JohannesGaessler

+ 46 - 46
Makefile

@@ -1156,10 +1156,10 @@ $(LIB_COMMON_S): $(OBJ_COMMON)
 
 
 # Clean generated server assets
 # Clean generated server assets
 clean-server-assets:
 clean-server-assets:
-	find examples/server -type f -name "*.js.hpp"   -delete
-	find examples/server -type f -name "*.mjs.hpp"  -delete
-	find examples/server -type f -name "*.css.hpp"  -delete
-	find examples/server -type f -name "*.html.hpp" -delete
+	find tools/server -type f -name "*.js.hpp"   -delete
+	find tools/server -type f -name "*.mjs.hpp"  -delete
+	find tools/server -type f -name "*.css.hpp"  -delete
+	find tools/server -type f -name "*.html.hpp" -delete
 
 
 # Clean rule
 # Clean rule
 clean: clean-server-assets
 clean: clean-server-assets
@@ -1179,7 +1179,7 @@ clean: clean-server-assets
 # Helper function that replaces .c, .cpp, and .cu file endings with .o:
 # Helper function that replaces .c, .cpp, and .cu file endings with .o:
 GET_OBJ_FILE = $(patsubst %.c,%.o,$(patsubst %.cpp,%.o,$(patsubst %.cu,%.o,$(1))))
 GET_OBJ_FILE = $(patsubst %.c,%.o,$(patsubst %.cpp,%.o,$(patsubst %.cu,%.o,$(1))))
 
 
-llama-cli: examples/main/main.cpp \
+llama-cli: tools/main/main.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
@@ -1192,7 +1192,7 @@ llama-infill: examples/infill/infill.cpp \
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
-llama-run: examples/run/run.cpp \
+llama-run: tools/run/run.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
@@ -1207,7 +1207,7 @@ llama-simple-chat: examples/simple-chat/simple-chat.cpp \
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
-llama-tokenize: examples/tokenize/tokenize.cpp \
+llama-tokenize: tools/tokenize/tokenize.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
@@ -1217,27 +1217,27 @@ llama-batched: examples/batched/batched.cpp \
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
-llama-batched-bench: examples/batched-bench/batched-bench.cpp \
+llama-batched-bench: tools/batched-bench/batched-bench.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
-llama-quantize: examples/quantize/quantize.cpp \
+llama-quantize: tools/quantize/quantize.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
-llama-quantize-stats: examples/quantize-stats/quantize-stats.cpp \
+llama-quantize-stats: tools/quantize-stats/quantize-stats.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
-llama-perplexity: examples/perplexity/perplexity.cpp \
+llama-perplexity: tools/perplexity/perplexity.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
-llama-imatrix: examples/imatrix/imatrix.cpp \
+llama-imatrix: tools/imatrix/imatrix.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
@@ -1279,7 +1279,7 @@ llama-gguf-hash: examples/gguf-hash/gguf-hash.cpp examples/gguf-hash/deps/sha1/s
 	$(CXX) $(CXXFLAGS) -Iexamples/gguf-hash/deps -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -Iexamples/gguf-hash/deps -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
-llama-gguf-split: examples/gguf-split/gguf-split.cpp \
+llama-gguf-split: tools/gguf-split/gguf-split.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
@@ -1289,7 +1289,7 @@ llama-eval-callback: examples/eval-callback/eval-callback.cpp \
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
-llama-cvector-generator: examples/cvector-generator/cvector-generator.cpp \
+llama-cvector-generator: tools/cvector-generator/cvector-generator.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
@@ -1299,12 +1299,12 @@ llama-convert-llama2c-to-ggml: examples/convert-llama2c-to-ggml/convert-llama2c-
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
-llama-bench: examples/llama-bench/llama-bench.cpp \
+llama-bench: tools/llama-bench/llama-bench.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
-llama-export-lora: examples/export-lora/export-lora.cpp \
+llama-export-lora: tools/export-lora/export-lora.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
@@ -1360,17 +1360,17 @@ llama-gbnf-validator: examples/gbnf-validator/gbnf-validator.cpp \
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
 ifdef GGML_RPC
 ifdef GGML_RPC
-rpc-server: examples/rpc/rpc-server.cpp \
+rpc-server: tools/rpc/rpc-server.cpp \
 	$(OBJ_GGML)
 	$(OBJ_GGML)
 	$(CXX) $(CXXFLAGS) $^ -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $^ -o $@ $(LDFLAGS)
 endif # GGML_RPC
 endif # GGML_RPC
 
 
 llama-server: \
 llama-server: \
-	examples/server/server.cpp \
-	examples/server/utils.hpp \
-	examples/server/httplib.h \
-	examples/server/index.html.hpp \
-	examples/server/loading.html.hpp \
+	tools/server/server.cpp \
+	tools/server/utils.hpp \
+	tools/server/httplib.h \
+	tools/server/index.html.hpp \
+	tools/server/loading.html.hpp \
 	common/chat.cpp \
 	common/chat.cpp \
 	common/chat.h \
 	common/chat.h \
 	common/chat-template.hpp \
 	common/chat-template.hpp \
@@ -1378,10 +1378,10 @@ llama-server: \
 	common/minja.hpp \
 	common/minja.hpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
-	$(CXX) $(CXXFLAGS) $(filter-out %.h %.hpp $<,$^) -Iexamples/server $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS) $(LWINSOCK2)
+	$(CXX) $(CXXFLAGS) $(filter-out %.h %.hpp $<,$^) -Itools/server $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS) $(LWINSOCK2)
 
 
-# Portable equivalent of `cd examples/server/public && xxd -i $(notdir $<) ../$(notdir $<).hpp`:
-examples/server/%.hpp: examples/server/public/% FORCE Makefile
+# Portable equivalent of `cd tools/server/public && xxd -i $(notdir $<) ../$(notdir $<).hpp`:
+tools/server/%.hpp: tools/server/public/% FORCE Makefile
 	@( export NAME=$(subst .,_,$(subst -,_,$(notdir $<))) && \
 	@( export NAME=$(subst .,_,$(subst -,_,$(notdir $<))) && \
 		echo "unsigned char $${NAME}[] = {" && \
 		echo "unsigned char $${NAME}[] = {" && \
 		cat $< | od -v -t x1 -An | sed -E 's/([0-9a-fA-F]+)/0x\1, /g' && \
 		cat $< | od -v -t x1 -An | sed -E 's/([0-9a-fA-F]+)/0x\1, /g' && \
@@ -1394,36 +1394,36 @@ llama-gen-docs: examples/gen-docs/gen-docs.cpp \
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
-libllava.a: examples/llava/llava.cpp \
-	examples/llava/llava.h \
-	examples/llava/clip.cpp \
-	examples/llava/clip.h \
+libllava.a: tools/llava/llava.cpp \
+	tools/llava/llava.h \
+	tools/llava/clip.cpp \
+	tools/llava/clip.h \
 	common/stb_image.h \
 	common/stb_image.h \
 	common/base64.hpp \
 	common/base64.hpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) -static -fPIC -c $< -o $@ -Wno-cast-qual
 	$(CXX) $(CXXFLAGS) -static -fPIC -c $< -o $@ -Wno-cast-qual
 
 
-llama-llava-cli: examples/llava/llava-cli.cpp \
-	examples/llava/llava.cpp \
-	examples/llava/llava.h \
-	examples/llava/clip.cpp \
-	examples/llava/clip.h \
+llama-llava-cli: tools/llava/llava-cli.cpp \
+	tools/llava/llava.cpp \
+	tools/llava/llava.h \
+	tools/llava/clip.cpp \
+	tools/llava/clip.h \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) $< $(filter-out %.h $<,$^) -o $@ $(LDFLAGS) -Wno-cast-qual
 	$(CXX) $(CXXFLAGS) $< $(filter-out %.h $<,$^) -o $@ $(LDFLAGS) -Wno-cast-qual
 
 
-llama-minicpmv-cli: examples/llava/minicpmv-cli.cpp \
-	examples/llava/llava.cpp \
-	examples/llava/llava.h \
-	examples/llava/clip.cpp \
-	examples/llava/clip.h \
+llama-minicpmv-cli: tools/llava/minicpmv-cli.cpp \
+	tools/llava/llava.cpp \
+	tools/llava/llava.h \
+	tools/llava/clip.cpp \
+	tools/llava/clip.h \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) $< $(filter-out %.h $<,$^) -o $@ $(LDFLAGS) -Wno-cast-qual
 	$(CXX) $(CXXFLAGS) $< $(filter-out %.h $<,$^) -o $@ $(LDFLAGS) -Wno-cast-qual
 
 
-llama-qwen2vl-cli: examples/llava/qwen2vl-cli.cpp \
-	examples/llava/llava.cpp \
-	examples/llava/llava.h \
-	examples/llava/clip.cpp \
-	examples/llava/clip.h \
+llama-qwen2vl-cli: tools/llava/qwen2vl-cli.cpp \
+	tools/llava/llava.cpp \
+	tools/llava/llava.h \
+	tools/llava/clip.cpp \
+	tools/llava/clip.h \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
 	$(CXX) $(CXXFLAGS) $< $(filter-out %.h $<,$^) -o $@ $(LDFLAGS) -Wno-cast-qual
 	$(CXX) $(CXXFLAGS) $< $(filter-out %.h $<,$^) -o $@ $(LDFLAGS) -Wno-cast-qual
 
 
@@ -1480,12 +1480,12 @@ tests/test-double-float: tests/test-double-float.cpp
 
 
 tests/test-json-schema-to-grammar: tests/test-json-schema-to-grammar.cpp \
 tests/test-json-schema-to-grammar: tests/test-json-schema-to-grammar.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
-	$(CXX) $(CXXFLAGS) -Iexamples/server -c $< -o $(call GET_OBJ_FILE, $<)
+	$(CXX) $(CXXFLAGS) -Itools/server -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
 tests/test-chat: tests/test-chat.cpp \
 tests/test-chat: tests/test-chat.cpp \
 	$(OBJ_ALL)
 	$(OBJ_ALL)
-	$(CXX) $(CXXFLAGS) -Iexamples/server -c $< -o $(call GET_OBJ_FILE, $<)
+	$(CXX) $(CXXFLAGS) -Itools/server -c $< -o $(call GET_OBJ_FILE, $<)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 	$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
 
 
 tests/test-opt: tests/test-opt.cpp \
 tests/test-opt: tests/test-opt.cpp \

+ 10 - 10
README.md

@@ -242,7 +242,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
 | [Vulkan](docs/build.md#vulkan) | GPU |
 | [Vulkan](docs/build.md#vulkan) | GPU |
 | [CANN](docs/build.md#cann) | Ascend NPU |
 | [CANN](docs/build.md#cann) | Ascend NPU |
 | [OpenCL](docs/backend/OPENCL.md) | Adreno GPU |
 | [OpenCL](docs/backend/OPENCL.md) | Adreno GPU |
-| [RPC](https://github.com/ggml-org/llama.cpp/tree/master/examples/rpc) | All |
+| [RPC](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc) | All |
 
 
 ## Building the project
 ## Building the project
 
 
@@ -276,9 +276,9 @@ The Hugging Face platform provides a variety of online tools for converting, qua
 - Use the [GGUF-editor space](https://huggingface.co/spaces/CISCai/gguf-editor) to edit GGUF meta data in the browser (more info: https://github.com/ggml-org/llama.cpp/discussions/9268)
 - Use the [GGUF-editor space](https://huggingface.co/spaces/CISCai/gguf-editor) to edit GGUF meta data in the browser (more info: https://github.com/ggml-org/llama.cpp/discussions/9268)
 - Use the [Inference Endpoints](https://ui.endpoints.huggingface.co/) to directly host `llama.cpp` in the cloud (more info: https://github.com/ggml-org/llama.cpp/discussions/9669)
 - Use the [Inference Endpoints](https://ui.endpoints.huggingface.co/) to directly host `llama.cpp` in the cloud (more info: https://github.com/ggml-org/llama.cpp/discussions/9669)
 
 
-To learn more about model quantization, [read this documentation](examples/quantize/README.md)
+To learn more about model quantization, [read this documentation](tools/quantize/README.md)
 
 
-## [`llama-cli`](examples/main)
+## [`llama-cli`](tools/main)
 
 
 #### A CLI tool for accessing and experimenting with most of `llama.cpp`'s functionality.
 #### A CLI tool for accessing and experimenting with most of `llama.cpp`'s functionality.
 
 
@@ -341,7 +341,7 @@ To learn more about model quantization, [read this documentation](examples/quant
     </details>
     </details>
 
 
 
 
-## [`llama-server`](examples/server)
+## [`llama-server`](tools/server)
 
 
 #### A lightweight, [OpenAI API](https://github.com/openai/openai-openapi) compatible, HTTP server for serving LLMs.
 #### A lightweight, [OpenAI API](https://github.com/openai/openai-openapi) compatible, HTTP server for serving LLMs.
 
 
@@ -411,7 +411,7 @@ To learn more about model quantization, [read this documentation](examples/quant
     </details>
     </details>
 
 
 
 
-## [`llama-perplexity`](examples/perplexity)
+## [`llama-perplexity`](tools/perplexity)
 
 
 #### A tool for measuring the perplexity [^1][^2] (and other quality metrics) of a model over a given text.
 #### A tool for measuring the perplexity [^1][^2] (and other quality metrics) of a model over a given text.
 
 
@@ -436,10 +436,10 @@ To learn more about model quantization, [read this documentation](examples/quant
 
 
     </details>
     </details>
 
 
-[^1]: [examples/perplexity/README.md](./examples/perplexity/README.md)
+[^1]: [tools/perplexity/README.md](./tools/perplexity/README.md)
 [^2]: [https://huggingface.co/docs/transformers/perplexity](https://huggingface.co/docs/transformers/perplexity)
 [^2]: [https://huggingface.co/docs/transformers/perplexity](https://huggingface.co/docs/transformers/perplexity)
 
 
-## [`llama-bench`](examples/llama-bench)
+## [`llama-bench`](tools/llama-bench)
 
 
 #### Benchmark the performance of the inference for various parameters.
 #### Benchmark the performance of the inference for various parameters.
 
 
@@ -460,7 +460,7 @@ To learn more about model quantization, [read this documentation](examples/quant
 
 
     </details>
     </details>
 
 
-## [`llama-run`](examples/run)
+## [`llama-run`](tools/run)
 
 
 #### A comprehensive example for running `llama.cpp` models. Useful for inferencing. Used with RamaLama [^3].
 #### A comprehensive example for running `llama.cpp` models. Useful for inferencing. Used with RamaLama [^3].
 
 
@@ -504,8 +504,8 @@ To learn more about model quantization, [read this documentation](examples/quant
 
 
 ## Other documentation
 ## Other documentation
 
 
-- [main (cli)](examples/main/README.md)
-- [server](examples/server/README.md)
+- [main (cli)](tools/main/README.md)
+- [server](tools/server/README.md)
 - [GBNF grammars](grammars/README.md)
 - [GBNF grammars](grammars/README.md)
 
 
 #### Development documentation
 #### Development documentation

+ 1 - 1
SECURITY.md

@@ -40,7 +40,7 @@ To protect sensitive data from potential leaks or unauthorized access, it is cru
 ### Untrusted environments or networks
 ### Untrusted environments or networks
 
 
 If you can't run your models in a secure and isolated environment or if it must be exposed to an untrusted network, make sure to take the following security precautions:
 If you can't run your models in a secure and isolated environment or if it must be exposed to an untrusted network, make sure to take the following security precautions:
-* Do not use the RPC backend, [rpc-server](https://github.com/ggml-org/llama.cpp/tree/master/examples/rpc) and [llama-server](https://github.com/ggml-org/llama.cpp/tree/master/examples/server) functionality (see https://github.com/ggml-org/llama.cpp/pull/13061).
+* Do not use the RPC backend, [rpc-server](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc) and [llama-server](https://github.com/ggml-org/llama.cpp/tree/master/tools/server) functionality (see https://github.com/ggml-org/llama.cpp/pull/13061).
 * Confirm the hash of any downloaded artifact (e.g. pre-trained model weights) matches a known-good value.
 * Confirm the hash of any downloaded artifact (e.g. pre-trained model weights) matches a known-good value.
 * Encrypt your data if sending it over the network.
 * Encrypt your data if sending it over the network.
 
 

+ 2 - 0
build-xcframework.sh

@@ -8,6 +8,7 @@ TVOS_MIN_OS_VERSION=16.4
 
 
 BUILD_SHARED_LIBS=OFF
 BUILD_SHARED_LIBS=OFF
 LLAMA_BUILD_EXAMPLES=OFF
 LLAMA_BUILD_EXAMPLES=OFF
+LLAMA_BUILD_TOOLS=OFF
 LLAMA_BUILD_TESTS=OFF
 LLAMA_BUILD_TESTS=OFF
 LLAMA_BUILD_SERVER=OFF
 LLAMA_BUILD_SERVER=OFF
 GGML_METAL=ON
 GGML_METAL=ON
@@ -31,6 +32,7 @@ COMMON_CMAKE_ARGS=(
     -DCMAKE_XCODE_ATTRIBUTE_DEVELOPMENT_TEAM=ggml
     -DCMAKE_XCODE_ATTRIBUTE_DEVELOPMENT_TEAM=ggml
     -DBUILD_SHARED_LIBS=${BUILD_SHARED_LIBS}
     -DBUILD_SHARED_LIBS=${BUILD_SHARED_LIBS}
     -DLLAMA_BUILD_EXAMPLES=${LLAMA_BUILD_EXAMPLES}
     -DLLAMA_BUILD_EXAMPLES=${LLAMA_BUILD_EXAMPLES}
+    -DLLAMA_BUILD_TOOLS=${LLAMA_BUILD_TOOLS}
     -DLLAMA_BUILD_TESTS=${LLAMA_BUILD_TESTS}
     -DLLAMA_BUILD_TESTS=${LLAMA_BUILD_TESTS}
     -DLLAMA_BUILD_SERVER=${LLAMA_BUILD_SERVER}
     -DLLAMA_BUILD_SERVER=${LLAMA_BUILD_SERVER}
     -DGGML_METAL_EMBED_LIBRARY=${GGML_METAL_EMBED_LIBRARY}
     -DGGML_METAL_EMBED_LIBRARY=${GGML_METAL_EMBED_LIBRARY}

+ 4 - 4
ci/run.sh

@@ -187,8 +187,8 @@ function gg_run_test_scripts_debug {
 
 
     set -e
     set -e
 
 
-    (cd ./examples/gguf-split && time bash tests.sh "$SRC/build-ci-debug/bin" "$MNT/models") 2>&1 | tee -a $OUT/${ci}-scripts.log
-    (cd ./examples/quantize   && time bash tests.sh "$SRC/build-ci-debug/bin" "$MNT/models") 2>&1 | tee -a $OUT/${ci}-scripts.log
+    (cd ./tools/gguf-split && time bash tests.sh "$SRC/build-ci-debug/bin" "$MNT/models") 2>&1 | tee -a $OUT/${ci}-scripts.log
+    (cd ./tools/quantize   && time bash tests.sh "$SRC/build-ci-debug/bin" "$MNT/models") 2>&1 | tee -a $OUT/${ci}-scripts.log
 
 
     set +e
     set +e
 }
 }
@@ -211,8 +211,8 @@ function gg_run_test_scripts_release {
 
 
     set -e
     set -e
 
 
-    (cd ./examples/gguf-split && time bash tests.sh "$SRC/build-ci-release/bin" "$MNT/models") 2>&1 | tee -a $OUT/${ci}-scripts.log
-    (cd ./examples/quantize   && time bash tests.sh "$SRC/build-ci-release/bin" "$MNT/models") 2>&1 | tee -a $OUT/${ci}-scripts.log
+    (cd ./tools/gguf-split && time bash tests.sh "$SRC/build-ci-release/bin" "$MNT/models") 2>&1 | tee -a $OUT/${ci}-scripts.log
+    (cd ./tools/quantize   && time bash tests.sh "$SRC/build-ci-release/bin" "$MNT/models") 2>&1 | tee -a $OUT/${ci}-scripts.log
 
 
     set +e
     set +e
 }
 }

+ 2 - 2
common/arg.cpp

@@ -2211,14 +2211,14 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
     ).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_NO_CONT_BATCHING"));
     ).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_NO_CONT_BATCHING"));
     add_opt(common_arg(
     add_opt(common_arg(
         {"--mmproj"}, "FILE",
         {"--mmproj"}, "FILE",
-        "path to a multimodal projector file. see examples/llava/README.md",
+        "path to a multimodal projector file. see tools/llava/README.md",
         [](common_params & params, const std::string & value) {
         [](common_params & params, const std::string & value) {
             params.mmproj.path = value;
             params.mmproj.path = value;
         }
         }
     ).set_examples(mmproj_examples));
     ).set_examples(mmproj_examples));
     add_opt(common_arg(
     add_opt(common_arg(
         {"--mmproj-url"}, "URL",
         {"--mmproj-url"}, "URL",
-        "URL to a multimodal projector file. see examples/llava/README.md",
+        "URL to a multimodal projector file. see tools/llava/README.md",
         [](common_params & params, const std::string & value) {
         [](common_params & params, const std::string & value) {
             params.mmproj.url = value;
             params.mmproj.url = value;
         }
         }

+ 3 - 3
common/common.h

@@ -340,7 +340,7 @@ struct common_params {
 
 
     common_conversation_mode conversation_mode = COMMON_CONVERSATION_MODE_AUTO;
     common_conversation_mode conversation_mode = COMMON_CONVERSATION_MODE_AUTO;
 
 
-    // multimodal models (see examples/llava)
+    // multimodal models (see tools/llava)
     struct common_params_model mmproj;
     struct common_params_model mmproj;
     bool mmproj_use_gpu = true;     // use GPU for multimodal model
     bool mmproj_use_gpu = true;     // use GPU for multimodal model
     bool no_mmproj = false;         // explicitly disable multimodal model
     bool no_mmproj = false;         // explicitly disable multimodal model
@@ -414,8 +414,8 @@ struct common_params {
     int n_pca_batch = 100;
     int n_pca_batch = 100;
     int n_pca_iterations = 1000;
     int n_pca_iterations = 1000;
     dimre_method cvector_dimre_method = DIMRE_METHOD_PCA;
     dimre_method cvector_dimre_method = DIMRE_METHOD_PCA;
-    std::string cvector_positive_file = "examples/cvector-generator/positive.txt";
-    std::string cvector_negative_file = "examples/cvector-generator/negative.txt";
+    std::string cvector_positive_file = "tools/cvector-generator/positive.txt";
+    std::string cvector_negative_file = "tools/cvector-generator/negative.txt";
 
 
     bool spm_infill = false; // suffix/prefix/middle pattern for infill
     bool spm_infill = false; // suffix/prefix/middle pattern for infill
 
 

+ 4 - 4
docs/development/HOWTO-add-model.md

@@ -9,10 +9,10 @@ Adding a model requires few steps:
 After following these steps, you can open PR.
 After following these steps, you can open PR.
 
 
 Also, it is important to check that the examples and main ggml backends (CUDA, METAL, CPU) are working with the new architecture, especially:
 Also, it is important to check that the examples and main ggml backends (CUDA, METAL, CPU) are working with the new architecture, especially:
-- [main](/examples/main/)
-- [imatrix](/examples/imatrix/)
-- [quantize](/examples/quantize/)
-- [server](/examples/server/)
+- [main](/tools/main/)
+- [imatrix](/tools/imatrix/)
+- [quantize](/tools/quantize/)
+- [server](/tools/server/)
 
 
 ### 1. Convert the model to GGUF
 ### 1. Convert the model to GGUF
 
 

+ 6 - 6
docs/multimodal/MobileVLM.md

@@ -33,13 +33,13 @@ git clone https://huggingface.co/openai/clip-vit-large-patch14-336
 2. Use `llava_surgery.py` to split the LLaVA model to LLaMA and multimodel projector constituents:
 2. Use `llava_surgery.py` to split the LLaVA model to LLaMA and multimodel projector constituents:
 
 
 ```sh
 ```sh
-python ./examples/llava/llava_surgery.py -m path/to/MobileVLM-1.7B
+python ./tools/llava/llava_surgery.py -m path/to/MobileVLM-1.7B
 ```
 ```
 
 
 3. Use `convert_image_encoder_to_gguf.py` with `--projector-type ldp` (for **V2** please use `--projector-type ldpv2`) to convert the LLaVA image encoder to GGUF:
 3. Use `convert_image_encoder_to_gguf.py` with `--projector-type ldp` (for **V2** please use `--projector-type ldpv2`) to convert the LLaVA image encoder to GGUF:
 
 
 ```sh
 ```sh
-python ./examples/llava/convert_image_encoder_to_gguf.py \
+python ./tools/llava/convert_image_encoder_to_gguf.py \
     -m path/to/clip-vit-large-patch14-336 \
     -m path/to/clip-vit-large-patch14-336 \
     --llava-projector path/to/MobileVLM-1.7B/llava.projector \
     --llava-projector path/to/MobileVLM-1.7B/llava.projector \
     --output-dir path/to/MobileVLM-1.7B \
     --output-dir path/to/MobileVLM-1.7B \
@@ -47,7 +47,7 @@ python ./examples/llava/convert_image_encoder_to_gguf.py \
 ```
 ```
 
 
 ```sh
 ```sh
-python ./examples/llava/convert_image_encoder_to_gguf.py \
+python ./tools/llava/convert_image_encoder_to_gguf.py \
     -m path/to/clip-vit-large-patch14-336 \
     -m path/to/clip-vit-large-patch14-336 \
     --llava-projector path/to/MobileVLM-1.7B_V2/llava.projector \
     --llava-projector path/to/MobileVLM-1.7B_V2/llava.projector \
     --output-dir path/to/MobileVLM-1.7B_V2 \
     --output-dir path/to/MobileVLM-1.7B_V2 \
@@ -69,10 +69,10 @@ Now both the LLaMA part and the image encoder is in the `MobileVLM-1.7B` directo
 
 
 ## Android compile and run
 ## Android compile and run
 ### compile
 ### compile
-refer to `examples/llava/android/build_64.sh`
+refer to `tools/llava/android/build_64.sh`
 ```sh
 ```sh
-mkdir examples/llava/android/build_64
-cd examples/llava/android/build_64
+mkdir tools/llava/android/build_64
+cd tools/llava/android/build_64
 ../build_64.sh
 ../build_64.sh
 ```
 ```
 ### run on Android
 ### run on Android

+ 2 - 2
docs/multimodal/glmedge.md

@@ -25,13 +25,13 @@ git clone https://huggingface.co/THUDM/glm-edge-v-5b or https://huggingface.co/T
 2. Use `glmedge-surgery.py` to split the GLMV-EDGE model to LLM and multimodel projector constituents:
 2. Use `glmedge-surgery.py` to split the GLMV-EDGE model to LLM and multimodel projector constituents:
 
 
 ```sh
 ```sh
-python ./examples/llava/glmedge-surgery.py -m ../model_path
+python ./tools/llava/glmedge-surgery.py -m ../model_path
 ```
 ```
 
 
 4. Use `glmedge-convert-image-encoder-to-gguf.py` to convert the GLMV-EDGE image encoder to GGUF:
 4. Use `glmedge-convert-image-encoder-to-gguf.py` to convert the GLMV-EDGE image encoder to GGUF:
 
 
 ```sh
 ```sh
-python ./examples/llava/glmedge-convert-image-encoder-to-gguf.py -m ../model_path --llava-projector ../model_path/glm.projector --output-dir ../model_path
+python ./tools/llava/glmedge-convert-image-encoder-to-gguf.py -m ../model_path --llava-projector ../model_path/glm.projector --output-dir ../model_path
 ```
 ```
 
 
 5. Use `examples/convert_hf_to_gguf.py` to convert the LLM part of GLMV-EDGE to GGUF:
 5. Use `examples/convert_hf_to_gguf.py` to convert the LLM part of GLMV-EDGE to GGUF:

+ 6 - 6
docs/multimodal/llava.md

@@ -37,19 +37,19 @@ git clone https://huggingface.co/openai/clip-vit-large-patch14-336
 2. Install the required Python packages:
 2. Install the required Python packages:
 
 
 ```sh
 ```sh
-pip install -r examples/llava/requirements.txt
+pip install -r tools/llava/requirements.txt
 ```
 ```
 
 
 3. Use `llava_surgery.py` to split the LLaVA model to LLaMA and multimodel projector constituents:
 3. Use `llava_surgery.py` to split the LLaVA model to LLaMA and multimodel projector constituents:
 
 
 ```sh
 ```sh
-python ./examples/llava/llava_surgery.py -m ../llava-v1.5-7b
+python ./tools/llava/llava_surgery.py -m ../llava-v1.5-7b
 ```
 ```
 
 
 4. Use `convert_image_encoder_to_gguf.py` to convert the LLaVA image encoder to GGUF:
 4. Use `convert_image_encoder_to_gguf.py` to convert the LLaVA image encoder to GGUF:
 
 
 ```sh
 ```sh
-python ./examples/llava/convert_image_encoder_to_gguf.py -m ../clip-vit-large-patch14-336 --llava-projector ../llava-v1.5-7b/llava.projector --output-dir ../llava-v1.5-7b
+python ./tools/llava/convert_image_encoder_to_gguf.py -m ../clip-vit-large-patch14-336 --llava-projector ../llava-v1.5-7b/llava.projector --output-dir ../llava-v1.5-7b
 ```
 ```
 
 
 5. Use `examples/convert_legacy_llama.py` to convert the LLaMA part of LLaVA to GGUF:
 5. Use `examples/convert_legacy_llama.py` to convert the LLaMA part of LLaVA to GGUF:
@@ -69,12 +69,12 @@ git clone https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b
 2) Install the required Python packages:
 2) Install the required Python packages:
 
 
 ```sh
 ```sh
-pip install -r examples/llava/requirements.txt
+pip install -r tools/llava/requirements.txt
 ```
 ```
 
 
 3) Use `llava_surgery_v2.py` which also supports llava-1.5 variants pytorch as well as safetensor models:
 3) Use `llava_surgery_v2.py` which also supports llava-1.5 variants pytorch as well as safetensor models:
 ```console
 ```console
-python examples/llava/llava_surgery_v2.py -C -m ../llava-v1.6-vicuna-7b/
+python tools/llava/llava_surgery_v2.py -C -m ../llava-v1.6-vicuna-7b/
 ```
 ```
 - you will find a llava.projector and a llava.clip file in your model directory
 - you will find a llava.projector and a llava.clip file in your model directory
 
 
@@ -88,7 +88,7 @@ curl -s -q https://huggingface.co/cmp-nct/llava-1.6-gguf/raw/main/config_vit.jso
 
 
 5) Create the visual gguf model:
 5) Create the visual gguf model:
 ```console
 ```console
-python ./examples/llava/convert_image_encoder_to_gguf.py -m vit --llava-projector vit/llava.projector --output-dir vit --clip-model-is-vision
+python ./tools/llava/convert_image_encoder_to_gguf.py -m vit --llava-projector vit/llava.projector --output-dir vit --clip-model-is-vision
 ```
 ```
 - This is similar to llava-1.5, the difference is that we tell the encoder that we are working with the pure vision model part of CLIP
 - This is similar to llava-1.5, the difference is that we tell the encoder that we are working with the pure vision model part of CLIP
 
 

+ 2 - 2
docs/multimodal/minicpmo2.6.md

@@ -29,8 +29,8 @@ cmake --build build --config Release
 Convert PyTorch model to gguf files (You can also download the converted [gguf](https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf) by us)
 Convert PyTorch model to gguf files (You can also download the converted [gguf](https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf) by us)
 
 
 ```bash
 ```bash
-python ./examples/llava/minicpmv-surgery.py -m ../MiniCPM-o-2_6
-python ./examples/llava/minicpmv-convert-image-encoder-to-gguf.py -m ../MiniCPM-o-2_6 --minicpmv-projector ../MiniCPM-o-2_6/minicpmv.projector --output-dir ../MiniCPM-o-2_6/ --image-mean 0.5 0.5 0.5 --image-std 0.5 0.5 0.5 --minicpmv_version 4
+python ./tools/llava/minicpmv-surgery.py -m ../MiniCPM-o-2_6
+python ./tools/llava/minicpmv-convert-image-encoder-to-gguf.py -m ../MiniCPM-o-2_6 --minicpmv-projector ../MiniCPM-o-2_6/minicpmv.projector --output-dir ../MiniCPM-o-2_6/ --image-mean 0.5 0.5 0.5 --image-std 0.5 0.5 0.5 --minicpmv_version 4
 python ./convert_hf_to_gguf.py ../MiniCPM-o-2_6/model
 python ./convert_hf_to_gguf.py ../MiniCPM-o-2_6/model
 
 
 # quantize int4 version
 # quantize int4 version

+ 2 - 2
docs/multimodal/minicpmv2.5.md

@@ -28,8 +28,8 @@ cmake --build build --config Release
 Convert PyTorch model to gguf files (You can also download the converted [gguf](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) by us)
 Convert PyTorch model to gguf files (You can also download the converted [gguf](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) by us)
 
 
 ```bash
 ```bash
-python ./examples/llava/minicpmv-surgery.py -m ../MiniCPM-Llama3-V-2_5
-python ./examples/llava/minicpmv-convert-image-encoder-to-gguf.py -m ../MiniCPM-Llama3-V-2_5 --minicpmv-projector ../MiniCPM-Llama3-V-2_5/minicpmv.projector --output-dir ../MiniCPM-Llama3-V-2_5/ --image-mean 0.5 0.5 0.5 --image-std 0.5 0.5 0.5 --minicpmv_version 2
+python ./tools/llava/minicpmv-surgery.py -m ../MiniCPM-Llama3-V-2_5
+python ./tools/llava/minicpmv-convert-image-encoder-to-gguf.py -m ../MiniCPM-Llama3-V-2_5 --minicpmv-projector ../MiniCPM-Llama3-V-2_5/minicpmv.projector --output-dir ../MiniCPM-Llama3-V-2_5/ --image-mean 0.5 0.5 0.5 --image-std 0.5 0.5 0.5 --minicpmv_version 2
 python ./convert_hf_to_gguf.py ../MiniCPM-Llama3-V-2_5/model
 python ./convert_hf_to_gguf.py ../MiniCPM-Llama3-V-2_5/model
 
 
 # quantize int4 version
 # quantize int4 version

+ 2 - 2
docs/multimodal/minicpmv2.6.md

@@ -28,8 +28,8 @@ cmake --build build --config Release
 Convert PyTorch model to gguf files (You can also download the converted [gguf](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) by us)
 Convert PyTorch model to gguf files (You can also download the converted [gguf](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) by us)
 
 
 ```bash
 ```bash
-python ./examples/llava/minicpmv-surgery.py -m ../MiniCPM-V-2_6
-python ./examples/llava/minicpmv-convert-image-encoder-to-gguf.py -m ../MiniCPM-V-2_6 --minicpmv-projector ../MiniCPM-V-2_6/minicpmv.projector --output-dir ../MiniCPM-V-2_6/ --image-mean 0.5 0.5 0.5 --image-std 0.5 0.5 0.5 --minicpmv_version 3
+python ./tools/llava/minicpmv-surgery.py -m ../MiniCPM-V-2_6
+python ./tools/llava/minicpmv-convert-image-encoder-to-gguf.py -m ../MiniCPM-V-2_6 --minicpmv-projector ../MiniCPM-V-2_6/minicpmv.projector --output-dir ../MiniCPM-V-2_6/ --image-mean 0.5 0.5 0.5 --image-std 0.5 0.5 0.5 --minicpmv_version 3
 python ./convert_hf_to_gguf.py ../MiniCPM-V-2_6/model
 python ./convert_hf_to_gguf.py ../MiniCPM-V-2_6/model
 
 
 # quantize int4 version
 # quantize int4 version

+ 1 - 22
examples/CMakeLists.txt

@@ -12,51 +12,30 @@ llama_add_compile_flags()
 
 
 # examples
 # examples
 
 
-include_directories(${CMAKE_CURRENT_SOURCE_DIR})
-
 if (EMSCRIPTEN)
 if (EMSCRIPTEN)
 else()
 else()
-    add_subdirectory(batched-bench)
     add_subdirectory(batched)
     add_subdirectory(batched)
     add_subdirectory(embedding)
     add_subdirectory(embedding)
     add_subdirectory(eval-callback)
     add_subdirectory(eval-callback)
 
 
     add_subdirectory(gguf-hash)
     add_subdirectory(gguf-hash)
-    add_subdirectory(gguf-split)
     add_subdirectory(gguf)
     add_subdirectory(gguf)
     add_subdirectory(gritlm)
     add_subdirectory(gritlm)
-    add_subdirectory(imatrix)
     add_subdirectory(infill)
     add_subdirectory(infill)
-    add_subdirectory(llama-bench)
     add_subdirectory(lookahead)
     add_subdirectory(lookahead)
     add_subdirectory(lookup)
     add_subdirectory(lookup)
-    add_subdirectory(main)
     add_subdirectory(parallel)
     add_subdirectory(parallel)
     add_subdirectory(passkey)
     add_subdirectory(passkey)
-    add_subdirectory(perplexity)
-    add_subdirectory(quantize)
     add_subdirectory(retrieval)
     add_subdirectory(retrieval)
-    if (LLAMA_BUILD_SERVER)
-        add_subdirectory(server)
-    endif()
     add_subdirectory(save-load-state)
     add_subdirectory(save-load-state)
-    add_subdirectory(run)
     add_subdirectory(simple)
     add_subdirectory(simple)
     add_subdirectory(simple-chat)
     add_subdirectory(simple-chat)
     add_subdirectory(speculative)
     add_subdirectory(speculative)
     add_subdirectory(speculative-simple)
     add_subdirectory(speculative-simple)
-    add_subdirectory(tokenize)
-    add_subdirectory(tts)
     add_subdirectory(gen-docs)
     add_subdirectory(gen-docs)
     if (NOT GGML_BACKEND_DL)
     if (NOT GGML_BACKEND_DL)
-        # these examples use the backends directly and cannot be built with dynamic loading
         add_subdirectory(convert-llama2c-to-ggml)
         add_subdirectory(convert-llama2c-to-ggml)
-        add_subdirectory(cvector-generator)
-        add_subdirectory(export-lora)
-        add_subdirectory(llava)
-        if (GGML_RPC)
-            add_subdirectory(rpc)
-        endif()
+        # these examples use the backends directly and cannot be built with dynamic loading
         if (GGML_SYCL)
         if (GGML_SYCL)
             add_subdirectory(sycl)
             add_subdirectory(sycl)
         endif()
         endif()

+ 1 - 1
examples/pydantic_models_to_grammar_examples.py

@@ -23,7 +23,7 @@ def create_completion(host, prompt, gbnf_grammar):
     """Calls the /completion API on llama-server.
     """Calls the /completion API on llama-server.
 
 
     See
     See
-    https://github.com/ggml-org/llama.cpp/tree/HEAD/examples/server#api-endpoints
+    https://github.com/ggml-org/llama.cpp/tree/HEAD/tools/server#api-endpoints
     """
     """
     print(f"  Request:\n    Grammar:\n{textwrap.indent(gbnf_grammar, '      ')}\n    Prompt:\n{textwrap.indent(prompt.rstrip(), '      ')}")
     print(f"  Request:\n    Grammar:\n{textwrap.indent(gbnf_grammar, '      ')}\n    Prompt:\n{textwrap.indent(prompt.rstrip(), '      ')}")
     headers = {"Content-Type": "application/json"}
     headers = {"Content-Type": "application/json"}

BIN
examples/server/public/index.html.gz


+ 6 - 6
grammars/README.md

@@ -1,6 +1,6 @@
 # GBNF Guide
 # GBNF Guide
 
 
-GBNF (GGML BNF) is a format for defining [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) to constrain model outputs in `llama.cpp`. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in `examples/main` and `examples/server`.
+GBNF (GGML BNF) is a format for defining [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) to constrain model outputs in `llama.cpp`. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in `tools/main` and `tools/server`.
 
 
 ## Background
 ## Background
 
 
@@ -110,21 +110,21 @@ While semantically correct, the syntax `x? x? x?.... x?` (with N repetitions) ma
 
 
 You can use GBNF grammars:
 You can use GBNF grammars:
 
 
-- In [llama-server](../examples/server)'s completion endpoints, passed as the `grammar` body field
-- In [llama-cli](../examples/main), passed as the `--grammar` & `--grammar-file` flags
+- In [llama-server](../tools/server)'s completion endpoints, passed as the `grammar` body field
+- In [llama-cli](../tools/main), passed as the `--grammar` & `--grammar-file` flags
 - With [test-gbnf-validator](../tests/test-gbnf-validator.cpp), to test them against strings.
 - With [test-gbnf-validator](../tests/test-gbnf-validator.cpp), to test them against strings.
 
 
 ## JSON Schemas → GBNF
 ## JSON Schemas → GBNF
 
 
 `llama.cpp` supports converting a subset of https://json-schema.org/ to GBNF grammars:
 `llama.cpp` supports converting a subset of https://json-schema.org/ to GBNF grammars:
 
 
-- In [llama-server](../examples/server):
+- In [llama-server](../tools/server):
     - For any completion endpoints, passed as the `json_schema` body field
     - For any completion endpoints, passed as the `json_schema` body field
     - For the `/chat/completions` endpoint, passed inside the `response_format` body field (e.g. `{"type", "json_object", "schema": {"items": {}}}` or `{ type: "json_schema", json_schema: {"schema": ...} }`)
     - For the `/chat/completions` endpoint, passed inside the `response_format` body field (e.g. `{"type", "json_object", "schema": {"items": {}}}` or `{ type: "json_schema", json_schema: {"schema": ...} }`)
-- In [llama-cli](../examples/main), passed as the `--json` / `-j` flag
+- In [llama-cli](../tools/main), passed as the `--json` / `-j` flag
 - To convert to a grammar ahead of time:
 - To convert to a grammar ahead of time:
     - in CLI, with [examples/json_schema_to_grammar.py](../examples/json_schema_to_grammar.py)
     - in CLI, with [examples/json_schema_to_grammar.py](../examples/json_schema_to_grammar.py)
-    - in JavaScript with [json-schema-to-grammar.mjs](../examples/server/public_legacy/json-schema-to-grammar.mjs) (this is used by the [server](../examples/server)'s Web UI)
+    - in JavaScript with [json-schema-to-grammar.mjs](../tools/server/public_legacy/json-schema-to-grammar.mjs) (this is used by the [server](../tools/server)'s Web UI)
 
 
 Take a look at [tests](../tests/test-json-schema-to-grammar.cpp) to see which features are likely supported (you'll also find usage examples in https://github.com/ggml-org/llama.cpp/pull/5978, https://github.com/ggml-org/llama.cpp/pull/6659 & https://github.com/ggml-org/llama.cpp/pull/6555).
 Take a look at [tests](../tests/test-json-schema-to-grammar.cpp) to see which features are likely supported (you'll also find usage examples in https://github.com/ggml-org/llama.cpp/pull/5978, https://github.com/ggml-org/llama.cpp/pull/6659 & https://github.com/ggml-org/llama.cpp/pull/6555).
 
 

+ 1 - 1
pyrightconfig.json

@@ -15,7 +15,7 @@
     },
     },
     {
     {
       // uses match expressions in steps.py
       // uses match expressions in steps.py
-      "root": "examples/server/tests",
+      "root": "tools/server/tests",
       "pythonVersion": "3.10",
       "pythonVersion": "3.10",
     },
     },
   ],
   ],

+ 3 - 3
requirements/requirements-all.txt

@@ -1,6 +1,6 @@
--r ../examples/llava/requirements.txt
--r ../examples/server/bench/requirements.txt
--r ../examples/server/tests/requirements.txt
+-r ../tools/llava/requirements.txt
+-r ../tools/server/bench/requirements.txt
+-r ../tools/server/tests/requirements.txt
 
 
 -r ./requirements-compare-llama-bench.txt
 -r ./requirements-compare-llama-bench.txt
 -r ./requirements-pydantic.txt
 -r ./requirements-pydantic.txt

+ 2 - 2
scripts/fetch_server_test_models.py

@@ -8,7 +8,7 @@
 
 
     Example:
     Example:
         python scripts/fetch_server_test_models.py
         python scripts/fetch_server_test_models.py
-        ( cd examples/server/tests && ./tests.sh -v -x -m slow )
+        ( cd tools/server/tests && ./tests.sh -v -x -m slow )
 '''
 '''
 import ast
 import ast
 import glob
 import glob
@@ -66,7 +66,7 @@ if __name__ == '__main__':
 
 
     models = sorted(list(set([
     models = sorted(list(set([
         model
         model
-        for test_file in glob.glob('examples/server/tests/unit/test_*.py')
+        for test_file in glob.glob('tools/server/tests/unit/test_*.py')
         for model in collect_hf_model_test_parameters(test_file)
         for model in collect_hf_model_test_parameters(test_file)
     ])), key=lambda m: (m.hf_repo, m.hf_file))
     ])), key=lambda m: (m.hf_repo, m.hf_file))
 
 

+ 3 - 3
scripts/tool_bench.py

@@ -2,7 +2,7 @@
 '''
 '''
     Simplistic tool call benchmarks for llama-server and ollama.
     Simplistic tool call benchmarks for llama-server and ollama.
 
 
-    Essentially runs the tests at server/examples/server/tests/unit/test_tool_call.py N times, at different temperatures and on different backends (current llama-server, baseline llama-server and ollama),
+    Essentially runs the tests at server/tools/server/tests/unit/test_tool_call.py N times, at different temperatures and on different backends (current llama-server, baseline llama-server and ollama),
     and plots the results of multiple runs (from same .jsonl file or multiple ones) as a success rate heatmap.
     and plots the results of multiple runs (from same .jsonl file or multiple ones) as a success rate heatmap.
 
 
     Simple usage example:
     Simple usage example:
@@ -51,8 +51,8 @@ import typer
 
 
 sys.path.insert(0, Path(__file__).parent.parent.as_posix())
 sys.path.insert(0, Path(__file__).parent.parent.as_posix())
 if True:
 if True:
-    from examples.server.tests.utils import ServerProcess
-    from examples.server.tests.unit.test_tool_call import TIMEOUT_SERVER_START, do_test_calc_result, do_test_hello_world, do_test_weather
+    from tools.server.tests.utils import ServerProcess
+    from tools.server.tests.unit.test_tool_call import TIMEOUT_SERVER_START, do_test_calc_result, do_test_hello_world, do_test_weather
 
 
 
 
 @contextmanager
 @contextmanager

+ 1 - 1
scripts/xxd.cmake

@@ -1,5 +1,5 @@
 # CMake equivalent of `xxd -i ${INPUT} ${OUTPUT}`
 # CMake equivalent of `xxd -i ${INPUT} ${OUTPUT}`
-# Usage: cmake -DINPUT=examples/server/public/index.html -DOUTPUT=examples/server/index.html.hpp -P scripts/xxd.cmake
+# Usage: cmake -DINPUT=tools/server/public/index.html -DOUTPUT=tools/server/index.html.hpp -P scripts/xxd.cmake
 
 
 SET(INPUT "" CACHE STRING "Input File")
 SET(INPUT "" CACHE STRING "Input File")
 SET(OUTPUT "" CACHE STRING "Output File")
 SET(OUTPUT "" CACHE STRING "Output File")

+ 1 - 1
tests/CMakeLists.txt

@@ -111,7 +111,7 @@ if (NOT WIN32)
     # TODO: disabled on loongarch64 because the ggml-ci node lacks Python 3.8
     # TODO: disabled on loongarch64 because the ggml-ci node lacks Python 3.8
     if (NOT ${CMAKE_SYSTEM_PROCESSOR} MATCHES "loongarch64")
     if (NOT ${CMAKE_SYSTEM_PROCESSOR} MATCHES "loongarch64")
         llama_build_and_test(test-json-schema-to-grammar.cpp   WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/..)
         llama_build_and_test(test-json-schema-to-grammar.cpp   WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/..)
-        target_include_directories(test-json-schema-to-grammar PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/../examples/server)
+        target_include_directories(test-json-schema-to-grammar PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/../tools/server)
     endif()
     endif()
 
 
     llama_build(test-quantize-stats.cpp)
     llama_build(test-quantize-stats.cpp)

+ 1 - 1
tests/run-json-schema-to-grammar.mjs

@@ -1,5 +1,5 @@
 import { readFileSync } from "fs"
 import { readFileSync } from "fs"
-import { SchemaConverter } from "../examples/server/public_legacy/json-schema-to-grammar.mjs"
+import { SchemaConverter } from "../tools/server/public_legacy/json-schema-to-grammar.mjs"
 
 
 const [, , file] = process.argv
 const [, , file] = process.argv
 const url = `file://${file}`
 const url = `file://${file}`

+ 39 - 0
tools/CMakeLists.txt

@@ -0,0 +1,39 @@
+# dependencies
+
+find_package(Threads REQUIRED)
+
+# third-party
+
+# ...
+
+# flags
+
+llama_add_compile_flags()
+
+# tools
+
+if (EMSCRIPTEN)
+else()
+    add_subdirectory(batched-bench)
+    add_subdirectory(gguf-split)
+    add_subdirectory(imatrix)
+    add_subdirectory(llama-bench)
+    add_subdirectory(main)
+    add_subdirectory(perplexity)
+    add_subdirectory(quantize)
+    if (LLAMA_BUILD_SERVER)
+        add_subdirectory(server)
+    endif()
+    add_subdirectory(run)
+    add_subdirectory(tokenize)
+    add_subdirectory(tts)
+    if (NOT GGML_BACKEND_DL)
+        # these examples use the backends directly and cannot be built with dynamic loading
+        add_subdirectory(cvector-generator)
+        add_subdirectory(export-lora)
+        add_subdirectory(llava)
+        if (GGML_RPC)
+            add_subdirectory(rpc)
+        endif()
+    endif()
+endif()

+ 0 - 0
examples/batched-bench/CMakeLists.txt → tools/batched-bench/CMakeLists.txt


+ 0 - 0
examples/batched-bench/README.md → tools/batched-bench/README.md


+ 0 - 0
examples/batched-bench/batched-bench.cpp → tools/batched-bench/batched-bench.cpp


+ 0 - 0
examples/cvector-generator/CMakeLists.txt → tools/cvector-generator/CMakeLists.txt


+ 0 - 0
examples/cvector-generator/README.md → tools/cvector-generator/README.md


+ 0 - 0
examples/cvector-generator/completions.txt → tools/cvector-generator/completions.txt


+ 0 - 0
examples/cvector-generator/cvector-generator.cpp → tools/cvector-generator/cvector-generator.cpp


+ 0 - 0
examples/cvector-generator/mean.hpp → tools/cvector-generator/mean.hpp


+ 0 - 0
examples/cvector-generator/negative.txt → tools/cvector-generator/negative.txt


+ 0 - 0
examples/cvector-generator/pca.hpp → tools/cvector-generator/pca.hpp


+ 0 - 0
examples/cvector-generator/positive.txt → tools/cvector-generator/positive.txt


+ 0 - 0
examples/export-lora/CMakeLists.txt → tools/export-lora/CMakeLists.txt


+ 0 - 0
examples/export-lora/README.md → tools/export-lora/README.md


+ 0 - 0
examples/export-lora/export-lora.cpp → tools/export-lora/export-lora.cpp


+ 0 - 0
examples/gguf-split/CMakeLists.txt → tools/gguf-split/CMakeLists.txt


+ 0 - 0
examples/gguf-split/README.md → tools/gguf-split/README.md


+ 0 - 0
examples/gguf-split/gguf-split.cpp → tools/gguf-split/gguf-split.cpp


+ 0 - 0
examples/gguf-split/tests.sh → tools/gguf-split/tests.sh


+ 0 - 0
examples/imatrix/CMakeLists.txt → tools/imatrix/CMakeLists.txt


+ 1 - 1
examples/imatrix/README.md → tools/imatrix/README.md

@@ -1,4 +1,4 @@
-# llama.cpp/examples/imatrix
+# llama.cpp/tools/imatrix
 
 
 Compute an importance matrix for a model and given text dataset. Can be used during quantization to enhance the quality of the quantized models.
 Compute an importance matrix for a model and given text dataset. Can be used during quantization to enhance the quality of the quantized models.
 More information is available here: https://github.com/ggml-org/llama.cpp/pull/4861
 More information is available here: https://github.com/ggml-org/llama.cpp/pull/4861

+ 0 - 0
examples/imatrix/imatrix.cpp → tools/imatrix/imatrix.cpp


+ 0 - 0
examples/llama-bench/CMakeLists.txt → tools/llama-bench/CMakeLists.txt


+ 1 - 1
examples/llama-bench/README.md → tools/llama-bench/README.md

@@ -1,4 +1,4 @@
-# llama.cpp/examples/llama-bench
+# llama.cpp/tools/llama-bench
 
 
 Performance testing tool for llama.cpp.
 Performance testing tool for llama.cpp.
 
 

+ 0 - 0
examples/llama-bench/llama-bench.cpp → tools/llama-bench/llama-bench.cpp


+ 0 - 0
examples/llava/CMakeLists.txt → tools/llava/CMakeLists.txt


+ 0 - 0
examples/llava/README-quantize.md → tools/llava/README-quantize.md


+ 0 - 0
examples/llava/README.md → tools/llava/README.md


+ 0 - 0
examples/llava/android/adb_run.sh → tools/llava/android/adb_run.sh


+ 0 - 0
examples/llava/android/build_64.sh → tools/llava/android/build_64.sh


+ 0 - 0
examples/llava/clip-impl.h → tools/llava/clip-impl.h


+ 0 - 0
examples/llava/clip-quantize-cli.cpp → tools/llava/clip-quantize-cli.cpp


+ 0 - 0
examples/llava/clip.cpp → tools/llava/clip.cpp


+ 0 - 0
examples/llava/clip.h → tools/llava/clip.h


+ 0 - 0
examples/llava/convert_image_encoder_to_gguf.py → tools/llava/convert_image_encoder_to_gguf.py


+ 0 - 0
examples/llava/deprecation-warning.cpp → tools/llava/deprecation-warning.cpp


+ 0 - 0
examples/llava/glmedge-convert-image-encoder-to-gguf.py → tools/llava/glmedge-convert-image-encoder-to-gguf.py


+ 0 - 0
examples/llava/glmedge-surgery.py → tools/llava/glmedge-surgery.py


+ 0 - 0
examples/llava/llava.cpp → tools/llava/llava.cpp


+ 0 - 0
examples/llava/llava.h → tools/llava/llava.h


+ 0 - 0
examples/llava/llava_surgery.py → tools/llava/llava_surgery.py


+ 0 - 0
examples/llava/llava_surgery_v2.py → tools/llava/llava_surgery_v2.py


+ 0 - 0
examples/llava/minicpmv-convert-image-encoder-to-gguf.py → tools/llava/minicpmv-convert-image-encoder-to-gguf.py


+ 0 - 0
examples/llava/minicpmv-surgery.py → tools/llava/minicpmv-surgery.py


+ 0 - 0
examples/llava/mtmd-cli.cpp → tools/llava/mtmd-cli.cpp


+ 0 - 0
examples/llava/mtmd.cpp → tools/llava/mtmd.cpp


+ 0 - 0
examples/llava/mtmd.h → tools/llava/mtmd.h


+ 0 - 0
examples/llava/qwen2vl-test.cpp → tools/llava/qwen2vl-test.cpp


+ 0 - 0
examples/llava/requirements.txt → tools/llava/requirements.txt


+ 0 - 0
examples/llava/test-1.jpeg → tools/llava/test-1.jpeg


+ 0 - 0
examples/llava/tests.sh → tools/llava/tests.sh


+ 0 - 0
examples/main/CMakeLists.txt → tools/main/CMakeLists.txt


+ 1 - 1
examples/main/README.md → tools/main/README.md

@@ -1,4 +1,4 @@
-# llama.cpp/examples/main
+# llama.cpp/tools/main
 
 
 This example program allows you to use various LLaMA language models easily and efficiently. It is specifically designed to work with the [llama.cpp](https://github.com/ggml-org/llama.cpp) project, which provides a plain C/C++ implementation with optional 4-bit quantization support for faster, lower memory inference, and is optimized for desktop CPUs. This program can be used to perform various inference tasks with LLaMA models, including generating text based on user-provided prompts and chat-like interactions with reverse prompts.
 This example program allows you to use various LLaMA language models easily and efficiently. It is specifically designed to work with the [llama.cpp](https://github.com/ggml-org/llama.cpp) project, which provides a plain C/C++ implementation with optional 4-bit quantization support for faster, lower memory inference, and is optimized for desktop CPUs. This program can be used to perform various inference tasks with LLaMA models, including generating text based on user-provided prompts and chat-like interactions with reverse prompts.
 
 

+ 0 - 0
examples/main/main.cpp → tools/main/main.cpp


+ 0 - 0
examples/perplexity/CMakeLists.txt → tools/perplexity/CMakeLists.txt


+ 0 - 0
examples/perplexity/README.md → tools/perplexity/README.md


+ 0 - 0
examples/perplexity/perplexity.cpp → tools/perplexity/perplexity.cpp


+ 0 - 0
examples/quantize/CMakeLists.txt → tools/quantize/CMakeLists.txt


+ 0 - 0
examples/quantize/README.md → tools/quantize/README.md


+ 0 - 0
examples/quantize/quantize.cpp → tools/quantize/quantize.cpp


+ 0 - 0
examples/quantize/tests.sh → tools/quantize/tests.sh


+ 0 - 0
examples/rpc/CMakeLists.txt → tools/rpc/CMakeLists.txt


+ 0 - 0
examples/rpc/README.md → tools/rpc/README.md


+ 0 - 0
examples/rpc/rpc-server.cpp → tools/rpc/rpc-server.cpp


+ 0 - 0
examples/run/CMakeLists.txt → tools/run/CMakeLists.txt


Some files were not shown because too many files changed in this diff