Test Drive: F5-TTS
Published on .
Today I’m test driving F5-TTS “a fully non-autoregressive text-to-speech system based on flow matching with Diffusion Transformer (DiT)” which @lllucas speedily ported to MLX. In a previous post I was working on a podcast, so I thought maybe I’d clone my voice for it. The process is simple:
pip install f5-tts-mlx
- Record some reference audio
- Infer
All I did for generation was the following:
python -m f5_tts_mlx.generate \
--text "Today I'm testdriving finetuning with M L X. I've got the Halloween spirit, so my task will be to spookify something with creepy words." \
--ref-audio ref3.wav \
--ref-text "The quick brown fox jumped over the fence"
Code language: Bash (bash)
Here is the result
Here is my actual voice for comparison
It’s based on this reference audio