|
|
vor 1 Jahr | |
|---|---|---|
| .devops | vor 1 Jahr | |
| .github | vor 1 Jahr | |
| Sources | vor 1 Jahr | |
| ci | vor 1 Jahr | |
| cmake | vor 1 Jahr | |
| common | vor 1 Jahr | |
| docs | vor 1 Jahr | |
| examples | vor 1 Jahr | |
| ggml | vor 1 Jahr | |
| gguf-py | vor 1 Jahr | |
| grammars | vor 1 Jahr | |
| include | vor 1 Jahr | |
| media | vor 1 Jahr | |
| models | vor 1 Jahr | |
| pocs | vor 1 Jahr | |
| prompts | vor 2 Jahren | |
| requirements | vor 1 Jahr | |
| scripts | vor 1 Jahr | |
| spm-headers | vor 1 Jahr | |
| src | vor 1 Jahr | |
| tests | vor 1 Jahr | |
| .clang-format | vor 1 Jahr | |
| .clang-tidy | vor 1 Jahr | |
| .dockerignore | vor 1 Jahr | |
| .ecrc | vor 1 Jahr | |
| .editorconfig | vor 1 Jahr | |
| .flake8 | vor 1 Jahr | |
| .gitignore | vor 1 Jahr | |
| .gitmodules | vor 1 Jahr | |
| .pre-commit-config.yaml | vor 1 Jahr | |
| AUTHORS | vor 1 Jahr | |
| CMakeLists.txt | vor 1 Jahr | |
| CMakePresets.json | vor 1 Jahr | |
| CODEOWNERS | vor 1 Jahr | |
| CONTRIBUTING.md | vor 1 Jahr | |
| LICENSE | vor 1 Jahr | |
| Makefile | vor 1 Jahr | |
| Package.swift | vor 1 Jahr | |
| README.md | vor 1 Jahr | |
| SECURITY.md | vor 1 Jahr | |
| convert_hf_to_gguf.py | vor 1 Jahr | |
| convert_hf_to_gguf_update.py | vor 1 Jahr | |
| convert_llama_ggml_to_gguf.py | vor 1 Jahr | |
| convert_lora_to_gguf.py | vor 1 Jahr | |
| flake.lock | vor 1 Jahr | |
| flake.nix | vor 1 Jahr | |
| mypy.ini | vor 2 Jahren | |
| poetry.lock | vor 1 Jahr | |
| pyproject.toml | vor 1 Jahr | |
| pyrightconfig.json | vor 1 Jahr | |
| requirements.txt | vor 1 Jahr |
Roadmap / Project status / Manifesto / ggml
Inference of Meta's LLaMA model (and others) in pure C/C++
The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide
range of hardware - locally and in the cloud.
The llama.cpp project is the main playground for developing new features for the ggml library.
6624c5cec3)
- [x] [GPT-NeoX](https://github.com/EleutherAI/gpt-neox) + [Pythia](https://github.com/EleutherAI/pythia)
- [x] [Snowflake-Arctic MoE](https://huggingface.co/collections/Snowflake/arctic-66290090ab)
- [x] [Smaug](https://huggingface.co/models?search=Smaug)
- [x] [Poro 34B](https://huggingface.co/LumiOpen/Poro-34B)
- [x] [Bitnet b1.58 models](https://huggingface.co/1bitLLM)
- [x] [Flan T5](https://huggingface.co/models?search=flan-t5)
- [x] [Open Elm models](https://huggingface.co/collections/apple/openelm-instruct-models-6619ad295d)
- [x] [ChatGLM3-6b](https://huggingface.co/THUDM/chatglm3-6b) + [ChatGLM4-9b](https://huggingface.co/THUDM/glm-4-9b)
- [x] [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad)
- [x] [EXAONE-3.0-7.8B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct)
- [x] [FalconMamba Models](https://huggingface.co/collections/tiiuae/falconmamba-7b-66b9a58032)
- [x] [Jais](https://huggingface.co/inceptionai/jais-13b-chat)
- [x] [Bielik-11B-v2.3](https://huggingface.co/collections/speakleash/bielik-11b-v23-66ee813238)
- [x] [RWKV-6](https://github.com/BlinkDL/RWKV-LM)
- [x] [QRWKV-6](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1)
- [x] [GigaChat-20B-A3B](https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct)
#### Multimodal
- [x] [LLaVA 1.5 models](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d9), [LLaVA 1.6 models](https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155)
- [x] [BakLLaVA](https://huggingface.co/models?search=SkunkworksAI/Bakllava)
- [x] [Obsidian](https://huggingface.co/NousResearch/Obsidian-3B-V0.5)
- [x] [ShareGPT4V](https://huggingface.co/models?search=Lin-Chen/ShareGPT4V)
- [x] [MobileVLM 1.7B/3B models](https://huggingface.co/models?search=mobileVLM)
- [x] [Yi-VL](https://huggingface.co/models?search=Yi-VL)
- [x] [Mini CPM](https://huggingface.co/models?search=MiniCPM)
- [x] [Moondream](https://huggingface.co/vikhyatk/moondream2)
- [x] [Bunny](https://github.com/BAAI-DCAI/Bunny)
- [x] [Qwen2-VL](https://huggingface.co/collections/Qwen/qwen2-vl-66cee74555)
| Backend | Target devices |
|---|---|
| Metal | Apple Silicon |
| BLAS | All |
| BLIS | All |
| SYCL | Intel and Nvidia GPU |
| MUSA | Moore Threads MTT GPU |
| CUDA | Nvidia GPU |
| HIP | AMD GPU |
| Vulkan | GPU |
| CANN | Ascend NPU |
The main product of this project is the llama library. Its C-style interface can be found in include/llama.h.
The project also includes many example programs and tools using the llama library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server. Possible methods for obtaining the binaries:
llama.cpp via brew, flox or nixThe Hugging Face platform hosts a number of LLMs compatible with llama.cpp:
You can either manually download the GGUF file or directly use any llama.cpp-compatible models from Hugging Face by using this CLI argument: -hf <user>/<model>[:quant]
After downloading a model, use the CLI tools to run it locally - see below.
llama.cpp requires the model to be stored in the GGUF file format. Models in other data formats can be converted to GGUF using the convert_*.py Python scripts in this repo.
The Hugging Face platform provides a variety of online tools for converting, quantizing and hosting models with llama.cpp:
llama.cpp in the cloud (more info: https://github.com/ggerganov/llama.cpp/discussions/9669)To learn more about model quantization, read this documentation
llama-clillama.cpp's functionality.llama-serverllama-perplexityllama-bench3e0ba0e60 (4229)
```
llama-runllama.cpp models. Useful for inferencing. Used with RamaLama ^3.llama-simplellama.cpp. Useful for developers.llama.cpp repo and merge PRs into the master branchIf your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT: