1 год назад · 8b350356b2
--- a/README.md
+++ b/README.md
@@ -114,6 +114,9 @@ Typically finetunes of the base models below are supported as well.
 
				 - [x] [MobileVLM 1.7B/3B models](https://huggingface.co/models?search=mobileVLM)
			
 
				 - [x] [Yi-VL](https://huggingface.co/models?search=Yi-VL)
			
 
				 
			
 
				+**HTTP server**
			
 
				+
			
 
				+[llama.cpp web server](./examples/server) is a lightweight [OpenAI API](https://github.com/openai/openai-openapi) compatible HTTP server that can be used to serve local models and easily connect them to existing clients.
			
 
				 
			
 
				 **Bindings:**
			
 
				 
			
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -1,8 +1,20 @@
 
				-# llama.cpp/example/server
			
 
				+# LLaMA.cpp HTTP Server
			
 
				 
			
 
				-This example demonstrates a simple HTTP API server and a simple web front end to interact with llama.cpp.
			
 
				+Fast, lightweight, pure C/C++ HTTP server based on [httplib](https://github.com/yhirose/cpp-httplib), [nlohmann::json](https://github.com/nlohmann/json) and **llama.cpp**.
			
 
				 
			
 
				-Command line options:
			
 
				+Set of LLM REST APIs and a simple web front end to interact with llama.cpp.
			
 
				+
			
 
				+**Features:**
			
 
				+ * LLM inference of F16 and quantum models on GPU and CPU
			
 
				+ * [OpenAI API](https://github.com/openai/openai-openapi) compatible chat completions and embeddings routes
			
 
				+ * Parallel decoding with multi-user support
			
 
				+ * Continuous batching
			
 
				+ * Multimodal (wip)
			
 
				+ * Monitoring endpoints
			
 
				+
			
 
				+The project is under active development, and we are [looking for feedback and contributors](https://github.com/ggerganov/llama.cpp/issues/4216).
			
 
				+
			
 
				+**Command line options:**
			
 
				 
			
 
				 - `--threads N`, `-t N`: Set the number of threads to use during generation.
			
 
				 - `-tb N, --threads-batch N`: Set the number of threads to use during batch and prompt processing. If not specified, the number of threads will be set to the number of threads used for generation.