mirror of https://github.com/SWivid/F5-TTS.git synced 2025-12-12 15:50:07 -08:00

Files

lizhuo 29d3326bed update: JA latest HF path in SHARED.md #928

* fix: update japanese latest hf path
* update the huggingface url

2025-03-28 22:36:17 +08:00

7.9 KiB

Raw Blame History

Shared Model Cards

Prerequisites of using

This document is serving as a quick lookup table for the community training/finetuning result, with various language support.
The models in this repository are open source and are based on voluntary contributions from contributors.
The use of models must be conditioned on respect for the respective creators. The convenience brought comes from their efforts.

Have a pretrained/finetuned result: model checkpoint (pruned best to facilitate inference, i.e. leave only ema_model_state_dict) and corresponding vocab file (for tokenization).
Host a public huggingface model repository and upload the model related files.
Make a pull request adding a model card to the current page, i.e. src\f5_tts\infer\SHARED.md.

Supported Languages

Multilingual
- F5-TTS v1 v0 Base @ zh & en @ F5-TTS
English
Finnish
- F5-TTS Base @ fi @ AsmoKoskinen
French
- F5-TTS Base @ fr @ RASPIAUDIO
Hindi
- F5-TTS Small @ hi @ SPRINGLab
Italian
- F5-TTS Base @ it @ alien79
Japanese
- F5-TTS Base @ ja @ Jmica
Mandarin
Russian
- F5-TTS Base @ ru @ HotDro4illa
Spanish
- F5-TTS Base @ es @ jpgallegoar

Multilingual

F5-TTS v1 v0 Base @ zh & en @ F5-TTS

Model	🤗Hugging Face	Data (Hours)	Model License
F5-TTS v1 Base	ckpt & vocab	Emilia 95K zh&en	cc-by-nc-4.0

Model: hf://SWivid/F5-TTS/F5TTS_v1_Base/model_1250000.safetensors
# A Variant Model: hf://SWivid/F5-TTS/F5TTS_v1_Base_no_zero_init/model_1250000.safetensors
Vocab: hf://SWivid/F5-TTS/F5TTS_v1_Base/vocab.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "conv_layers": 4}

Model	🤗Hugging Face	Data (Hours)	Model License
F5-TTS Base	ckpt & vocab	Emilia 95K zh&en	cc-by-nc-4.0

Model: hf://SWivid/F5-TTS/F5TTS_Base/model_1200000.safetensors
Vocab: hf://SWivid/F5-TTS/F5TTS_Base/vocab.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}

Other infos, e.g. Author info, Github repo, Link to some sampled results, Usage instruction, Tutorial (Blog, Video, etc.) ...

English

Finnish

F5-TTS Base @ fi @ AsmoKoskinen

Model	🤗Hugging Face	Data	Model License
F5-TTS Base	ckpt & vocab	Common Voice, Vox Populi	cc-by-nc-4.0

Model: hf://AsmoKoskinen/F5-TTS_Finnish_Model/model_common_voice_fi_vox_populi_fi_20241206.safetensors
Vocab: hf://AsmoKoskinen/F5-TTS_Finnish_Model/vocab.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}

French

F5-TTS Base @ fr @ RASPIAUDIO

Model	🤗Hugging Face	Data (Hours)	Model License
F5-TTS Base	ckpt & vocab	LibriVox	cc-by-nc-4.0

Model: hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/model_last_reduced.pt
Vocab: hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/vocab.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}

Hindi

F5-TTS Small @ hi @ SPRINGLab

Model	🤗Hugging Face	Data (Hours)	Model License
F5-TTS Small	ckpt & vocab	IndicTTS Hi & IndicVoices-R Hi	cc-by-4.0

Model: hf://SPRINGLab/F5-Hindi-24KHz/model_2500000.safetensors
Vocab: hf://SPRINGLab/F5-Hindi-24KHz/vocab.txt
Config: {"dim": 768, "depth": 18, "heads": 12, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}

Authors: SPRING Lab, Indian Institute of Technology, Madras
Website: https://asr.iitm.ac.in/

Italian

F5-TTS Base @ it @ alien79

Model	🤗Hugging Face	Data	Model License
F5-TTS Base	ckpt & vocab	ylacombe/cml-tts	cc-by-nc-4.0

Model: hf://alien79/F5-TTS-italian/model_159600.safetensors
Vocab: hf://alien79/F5-TTS-italian/vocab.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}

Trained by Mithril Man
Model details on hf project home
Open to collaborations to further improve the model

Japanese

F5-TTS Base @ ja @ Jmica

Model	🤗Hugging Face	Data (Hours)	Model License
F5-TTS Base	ckpt & vocab	Emilia 1.7k JA & Galgame Dataset 5.4k	cc-by-nc-4.0

Model: hf://Jmica/F5TTS/JA_21999120/model_21999120.pt
Vocab: hf://Jmica/F5TTS/JA_21999120/vocab_japanese.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}

Mandarin

Russian

F5-TTS Base @ ru @ HotDro4illa

Model	🤗Hugging Face	Data (Hours)	Model License
F5-TTS Base	ckpt & vocab	Common voice	cc-by-nc-4.0

Model: hf://hotstone228/F5-TTS-Russian/model_last.safetensors
Vocab: hf://hotstone228/F5-TTS-Russian/vocab.txt
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}

Finetuned by HotDro4illa
Any improvements are welcome

Spanish

F5-TTS Base @ es @ jpgallegoar

Model	🤗Hugging Face	Data (Hours)	Model License
F5-TTS Base	ckpt & vocab	Voxpopuli & Crowdsourced & TEDx, 218 hours	cc0-1.0

@jpgallegoar GitHub repo, Jupyter Notebook and Gradio usage for Spanish model.

7.9 KiB Raw Blame History