Nat TaylorBlog, AI, Product Management & Tinkering

Test Drive: F5-TTS

Published on .

Today I’m test driving F5-TTS “a fully non-autoregressive text-to-speech system based on flow matching with Diffusion Transformer (DiT)” which @lllucas speedily ported to MLX. In a previous post I was working on a podcast, so I thought maybe I’d clone my voice for it. The process is simple:

  1. pip install f5-tts-mlx
  2. Record some reference audio
  3. Infer

All I did for generation was the following:

python -m f5_tts_mlx.generate \
--text "Today I'm testdriving finetuning with M L X.  I've got the Halloween spirit, so my task will be to spookify something with creepy words." \
--ref-audio ref3.wav \
--ref-text "The quick brown fox jumped over the fence"Code language: Bash (bash)

Here is the result

Here is my actual voice for comparison

It’s based on this reference audio

Popular Posts

Post Navigation

«
»