mirror of
https://github.com/SWivid/F5-TTS.git
synced 2025-12-12 15:50:07 -08:00
7.9 KiB
7.9 KiB
Shared Model Cards
Prerequisites of using
- This document is serving as a quick lookup table for the community training/finetuning result, with various language support.
- The models in this repository are open source and are based on voluntary contributions from contributors.
- The use of models must be conditioned on respect for the respective creators. The convenience brought comes from their efforts.
Welcome to share here
- Have a pretrained/finetuned result: model checkpoint (pruned best to facilitate inference, i.e. leave only
ema_model_state_dict) and corresponding vocab file (for tokenization). - Host a public huggingface model repository and upload the model related files.
- Make a pull request adding a model card to the current page, i.e.
src\f5_tts\infer\SHARED.md.
Supported Languages
Multilingual
F5-TTS v1 v0 Base @ zh & en @ F5-TTS
| Model | 🤗Hugging Face | Data (Hours) | Model License |
|---|---|---|---|
| F5-TTS v1 Base | ckpt & vocab | Emilia 95K zh&en | cc-by-nc-4.0 |
Model: hf://SWivid/F5-TTS/F5TTS_v1_Base/model_1250000.safetensors
# A Variant Model: hf://SWivid/F5-TTS/F5TTS_v1_Base_no_zero_init/model_1250000.safetensors
Vocab: hf://SWivid/F5-TTS/F5TTS_v1_Base/vocab.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "conv_layers": 4}
| Model | 🤗Hugging Face | Data (Hours) | Model License |
|---|---|---|---|
| F5-TTS Base | ckpt & vocab | Emilia 95K zh&en | cc-by-nc-4.0 |
Model: hf://SWivid/F5-TTS/F5TTS_Base/model_1200000.safetensors
Vocab: hf://SWivid/F5-TTS/F5TTS_Base/vocab.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}
Other infos, e.g. Author info, Github repo, Link to some sampled results, Usage instruction, Tutorial (Blog, Video, etc.) ...
English
Finnish
F5-TTS Base @ fi @ AsmoKoskinen
| Model | 🤗Hugging Face | Data | Model License |
|---|---|---|---|
| F5-TTS Base | ckpt & vocab | Common Voice, Vox Populi | cc-by-nc-4.0 |
Model: hf://AsmoKoskinen/F5-TTS_Finnish_Model/model_common_voice_fi_vox_populi_fi_20241206.safetensors
Vocab: hf://AsmoKoskinen/F5-TTS_Finnish_Model/vocab.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}
French
F5-TTS Base @ fr @ RASPIAUDIO
| Model | 🤗Hugging Face | Data (Hours) | Model License |
|---|---|---|---|
| F5-TTS Base | ckpt & vocab | LibriVox | cc-by-nc-4.0 |
Model: hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/model_last_reduced.pt
Vocab: hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/vocab.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}
- Online Inference with Hugging Face Space.
- Tutorial video to train a new language model.
- Discussion about this training can be found here.
Hindi
F5-TTS Small @ hi @ SPRINGLab
| Model | 🤗Hugging Face | Data (Hours) | Model License |
|---|---|---|---|
| F5-TTS Small | ckpt & vocab | IndicTTS Hi & IndicVoices-R Hi | cc-by-4.0 |
Model: hf://SPRINGLab/F5-Hindi-24KHz/model_2500000.safetensors
Vocab: hf://SPRINGLab/F5-Hindi-24KHz/vocab.txt
Config: {"dim": 768, "depth": 18, "heads": 12, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}
- Authors: SPRING Lab, Indian Institute of Technology, Madras
- Website: https://asr.iitm.ac.in/
Italian
F5-TTS Base @ it @ alien79
| Model | 🤗Hugging Face | Data | Model License |
|---|---|---|---|
| F5-TTS Base | ckpt & vocab | ylacombe/cml-tts | cc-by-nc-4.0 |
Model: hf://alien79/F5-TTS-italian/model_159600.safetensors
Vocab: hf://alien79/F5-TTS-italian/vocab.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}
- Trained by Mithril Man
- Model details on hf project home
- Open to collaborations to further improve the model
Japanese
F5-TTS Base @ ja @ Jmica
| Model | 🤗Hugging Face | Data (Hours) | Model License |
|---|---|---|---|
| F5-TTS Base | ckpt & vocab | Emilia 1.7k JA & Galgame Dataset 5.4k | cc-by-nc-4.0 |
Model: hf://Jmica/F5TTS/JA_21999120/model_21999120.pt
Vocab: hf://Jmica/F5TTS/JA_21999120/vocab_japanese.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}
Mandarin
Russian
F5-TTS Base @ ru @ HotDro4illa
| Model | 🤗Hugging Face | Data (Hours) | Model License |
|---|---|---|---|
| F5-TTS Base | ckpt & vocab | Common voice | cc-by-nc-4.0 |
Model: hf://hotstone228/F5-TTS-Russian/model_last.safetensors
Vocab: hf://hotstone228/F5-TTS-Russian/vocab.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}
- Finetuned by HotDro4illa
- Any improvements are welcome
Spanish
F5-TTS Base @ es @ jpgallegoar
| Model | 🤗Hugging Face | Data (Hours) | Model License |
|---|---|---|---|
| F5-TTS Base | ckpt & vocab | Voxpopuli & Crowdsourced & TEDx, 218 hours | cc0-1.0 |
- @jpgallegoar GitHub repo, Jupyter Notebook and Gradio usage for Spanish model.