преди 1 година · ba8a1f9c5b
--- a/examples/tts/README.md
+++ b/examples/tts/README.md
@@ -0,0 +1,80 @@
 
															+# llama.cpp/example/tts
														
 
															+This example demonstrates the Text To Speech feature. It uses a
														
 
															+[model](https://www.outeai.com/blog/outetts-0.2-500m) from
														
 
															+[outeai](https://www.outeai.com/).
														
 
															+
														
 
															+## Quickstart
														
 
															+If you have built llama.cpp with `-DLLAMA_CURL=ON` you can simply run the
														
 
															+following command and the required models will be downloaded automatically:
														
 
															+```console
														
 
															+$ build/bin/llama-tts --tts-oute-default -p "Hello world" && aplay output.wav
														
 
															+```
														
 
															+For details about the models and how to convert them to the required format
														
 
															+see the following sections.
														
 
															+
														
 
															+### Model conversion
														
 
															+Checkout or download the model that contains the LLM model:
														
 
															+```console
														
 
															+$ pushd models
														
 
															+$ git clone --branch main --single-branch --depth 1 https://huggingface.co/OuteAI/OuteTTS-0.2-500M
														
 
															+$ cd OuteTTS-0.2-500M && git lfs install && git lfs pull
														
 
															+$ popd
														
 
															+```
														
 
															+Convert the model to .gguf format:
														
 
															+```console
														
 
															+(venv) python convert_hf_to_gguf.py models/OuteTTS-0.2-500M \
														
 
															+    --outfile models/outetts-0.2-0.5B-f16.gguf --outtype f16
														
 
															+```
														
 
															+The generated model will be `models/outetts-0.2-0.5B-f16.gguf`.
														
 
															+
														
 
															+We can optionally quantize this to Q8_0 using the following command:
														
 
															+```console
														
 
															+$ build/bin/llama-quantize models/outetts-0.2-0.5B-f16.gguf \
														
 
															+    models/outetts-0.2-0.5B-q8_0.gguf q8_0
														
 
															+```
														
 
															+The quantized model will be `models/outetts-0.2-0.5B-q8_0.gguf`.
														
 
															+
														
 
															+Next we do something simlar for the audio decoder. First download or checkout
														
 
															+the model for the voice decoder:
														
 
															+```console
														
 
															+$ pushd models
														
 
															+$ git clone --branch main --single-branch --depth 1 https://huggingface.co/novateur/WavTokenizer-large-speech-75token
														
 
															+$ cd WavTokenizer-large-speech-75token && git lfs install && git lfs pull
														
 
															+$ popd
														
 
															+```
														
 
															+This model file is PyTorch checkpoint (.ckpt) and we first need to convert it to
														
 
															+huggingface format:
														
 
															+```console
														
 
															+(venv) python examples/tts/convert_pt_to_hf.py \
														
 
															+    models/WavTokenizer-large-speech-75token/wavtokenizer_large_speech_320_24k.ckpt
														
 
															+...
														
 
															+Model has been successfully converted and saved to models/WavTokenizer-large-speech-75token/model.safetensors
														
 
															+Metadata has been saved to models/WavTokenizer-large-speech-75token/index.json
														
 
															+Config has been saved to models/WavTokenizer-large-speech-75tokenconfig.json
														
 
															+```
														
 
															+Then we can convert the huggingface format to gguf:
														
 
															+```console
														
 
															+(venv) python convert_hf_to_gguf.py models/WavTokenizer-large-speech-75token \
														
 
															+    --outfile models/wavtokenizer-large-75-f16.gguf --outtype f16
														
 
															+...
														
 
															+INFO:hf-to-gguf:Model successfully exported to models/wavtokenizer-large-75-f16.gguf
														
 
															+```
														
 
															+
														
 
															+### Running the example
														
 
															+
														
 
															+With both of the models generated, the LLM model and the voice decoder model,
														
 
															+we can run the example:
														
 
															+```console
														
 
															+$ build/bin/llama-tts -m  ./models/outetts-0.2-0.5B-q8_0.gguf \
														
 
															+    -mv ./models/wavtokenizer-large-75-f16.gguf \
														
 
															+    -p "Hello world"
														
 
															+...
														
 
															+main: audio written to file 'output.wav'
														
 
															+```
														
 
															+The output.wav file will contain the audio of the prompt. This can be heard
														
 
															+by playing the file with a media player. On Linux the following command will
														
 
															+play the audio:
														
 
															+```console
														
 
															+$ aplay output.wav
														
 
															+```
														
 
															+