convert to pkg, reorganize repo (#228)

* group files in f5_tts directory * add setup.py * use global imports * simplify demo * add install directions for library mode * fix old huggingface_hub version constraint * move finetune to package * change imports to f5_tts.model * bump version * fix bad merge * Update inference-cli.py * fix HF space * reformat * fix utils.py vocab.txt import * fix format * adapt README for f5_tts package structure * simplify app.py * add gradio.Dockerfile and workflow * refactored for pyproject.toml * refactored for pyproject.toml * added in reference to packaged files * use fork for testing docker image * added in reference to packaged files * minor tweaks * fixed inference-cli.toml path * fixed inference-cli.toml path * fixed inference-cli.toml path * fixed inference-cli.toml path * refactor eval_infer_batch.py * fix typo * added eval_infer_batch to scripts --------- Co-authored-by: Roberts Slisans <rsxdalv@gmail.com> Co-authored-by: Adam Kessel <adam@rosi-kessel.org> Co-authored-by: Roberts Slisans <roberts.slisans@gmail.com>
2025-12-12 15:50:07 -08:00 · 2024-10-23 21:07:59 +08:00
parent 32c3ee7701
commit c4eee0f96b
38 changed files with 451 additions and 259 deletions
--- a/README.md
+++ b/README.md
@@ -63,11 +63,35 @@ pre-commit run --all-files
 Note: Some model components have linting exceptions for E722 to accommodate tensor notation


-## Prepare Dataset
-
-Example data processing scripts for Emilia and Wenetspeech4TTS, and you may tailor your own one along with a Dataset class in `model/dataset.py`.
+### As a pip package

 ```bash
+pip install git+https://github.com/SWivid/F5-TTS.git
+```
+
+```python
+import gradio as gr
+from f5_tts.gradio_app import app
+
+with gr.Blocks() as main_app:
+    gr.Markdown("# This is an example of using F5-TTS within a bigger Gradio app")
+
+    # ... other Gradio components
+
+    app.render()
+
+main_app.launch()
+
+```
+
+## Prepare Dataset
+
+Example data processing scripts for Emilia and Wenetspeech4TTS, and you may tailor your own one along with a Dataset class in `f5_tts/model/dataset.py`.
+
+```bash
+# switch to the main directory
+cd f5_tts
+
 # prepare custom dataset up to your need
 # download corresponding dataset first, and fill in the path in scripts

@@ -83,6 +107,9 @@ python scripts/prepare_wenetspeech4tts.py
 Once your datasets are prepared, you can start the training process.

 ```bash
+# switch to the main directory
+cd f5_tts
+
 # setup accelerate config, e.g. use multi-gpu ddp, fp16
 # will be to: ~/.cache/huggingface/accelerate/default_config.yaml     
 accelerate config
@@ -90,7 +117,7 @@ accelerate launch train.py
 ```
 An initial guidance on Finetuning [#57](https://github.com/SWivid/F5-TTS/discussions/57).

-Gradio UI finetuning with `finetune_gradio.py` see [#143](https://github.com/SWivid/F5-TTS/discussions/143).
+Gradio UI finetuning with `f5_tts/finetune_gradio.py` see [#143](https://github.com/SWivid/F5-TTS/discussions/143).

 ### Wandb Logging

@@ -136,6 +163,9 @@ for change model use `--ckpt_file` to specify the model you want to load,
 for change vocab.txt use `--vocab_file` to provide your vocab.txt file.

 ```bash
+# switch to the main directory
+cd f5_tts
+
 python inference-cli.py \
 --model "F5-TTS" \
 --ref_audio "tests/ref_audio/test_en_1_ref_short.wav" \
@@ -161,19 +191,19 @@ Currently supported features:
 You can launch a Gradio app (web interface) to launch a GUI for inference (will load ckpt from Huggingface, you may also use local file in `gradio_app.py`). Currently load ASR model, F5-TTS and E2 TTS all in once, thus use more GPU memory than `inference-cli`.

 ```bash
-python gradio_app.py
+python f5_tts/gradio_app.py
 ```

 You can specify the port/host:

 ```bash
-python gradio_app.py --port 7860 --host 0.0.0.0
+python f5_tts/gradio_app.py --port 7860 --host 0.0.0.0
 ```

 Or launch a share link:

 ```bash
-python gradio_app.py --share
+python f5_tts/gradio_app.py --share
 ```

 ### Speech Editing
@@ -181,7 +211,7 @@ python gradio_app.py --share
 To test speech editing capabilities, use the following command.

 ```bash
-python speech_edit.py
+python f5_tts/speech_edit.py
 ```

 ## Evaluation
@@ -199,6 +229,9 @@ python speech_edit.py
 To run batch inference for evaluations, execute the following commands:

 ```bash
+# switch to the main directory
+cd f5_tts
+
 # batch inference for evaluations
 accelerate config  # if not set before
 bash scripts/eval_infer_batch.sh
@@ -234,6 +267,9 @@ pip install faster-whisper==0.10.1

 Update the path with your batch-inferenced results, and carry out WER / SIM evaluations:
 ```bash
+# switch to the main directory
+cd f5_tts
+
 # Evaluation for Seed-TTS test set
 python scripts/eval_seedtts_testset.py