gitea-mirror/F5-TTS

Fork 0

mirror of https://github.com/SWivid/F5-TTS.git synced 2025-12-12 07:40:43 -08:00

Files

SWivid 8629c6f91f initial updates for infer stuffs

2024-10-24 23:51:20 +08:00

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

F5-TTS: Diffusion Transformer with ConvNeXt V2, faster trained and inference.

E2 TTS: Flat-UNet Transformer, closest reproduction from paper.

Sway Sampling: Inference-time flow step sampling strategy, greatly improves performance

Thanks to all the contributors !

News

2024/10/08: F5-TTS & E2 TTS base models on 🤗 Hugging Face, 🤖 Model Scope.

Installation

# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n f5-tts python=3.10
conda activate f5-tts

# Install pytorch with your CUDA version, e.g.
pip install torch==2.3.0+cu118 torchaudio==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

Then you can choose from a few options below:

1. As a pip package (if just for inference)

pip install git+https://github.com/SWivid/F5-TTS.git

2. Local editable (if also do training, finetuning)

git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS
pip install -e .

3. Build from dockerfile

docker build -t f5tts:v1 .

Inference

1. Basic usage

# cli inference
f5-tts_infer-cli

# gradio interface
f5-tts_infer-gradio

2. More instructions

In order to have better generation results, take a moment to read detailed guidance.
The Issues are very useful, please try to find the solution by properly searching the keywords of problem encountered. If no answer found, then feel free to open an issue.

Development

Use pre-commit to ensure code quality (will run linters and formatters automatically)

pip install pre-commit
pre-commit install

When making a pull request, before each commit, run:

pre-commit run --all-files

Note: Some model components have linting exceptions for E722 to accommodate tensor notation

Acknowledgements

E2-TTS brilliant work, simple and effective
Emilia, WenetSpeech4TTS valuable datasets
lucidrains initial CFM structure with also bfs18 for discussion
SD3 & Hugging Face diffusers DiT and MMDiT code structure
torchdiffeq as ODE solver, Vocos as vocoder
FunASR, faster-whisper, UniSpeech for evaluation tools
ctc-forced-aligner for speech edit test
mrfakename huggingface space demo ~
f5-tts-mlx Implementation with MLX framework by Lucas Newman
F5-TTS-ONNX ONNX Runtime version by DakeQQ

Citation

If our work and codebase is useful for you, please cite as:

@article{chen-etal-2024-f5tts,
      title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, 
      author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
      journal={arXiv preprint arXiv:2410.06885},
      year={2024},
}

License

Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.

4.8 KiB

Raw Blame History

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Thanks to all the contributors !

News

Installation

1. As a pip package (if just for inference)

2. Local editable (if also do training, finetuning)

3. Build from dockerfile

Inference

1. Basic usage

2. More instructions

Training

Evaluation

Development

Acknowledgements

Citation

License

4.8 KiB Raw Blame History

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Thanks to all the contributors !

News

Installation

1. As a pip package (if just for inference)

2. Local editable (if also do training, finetuning)

3. Build from dockerfile

Inference

1. Basic usage

2. More instructions

Training

Evaluation

Development

Acknowledgements

Citation

License

4.8 KiB

Raw Blame History