Voice Cloning - Arnold Biffna Portfolio

I am completely amazed at how well voice cloning works locally on my machine.

I already had a local MLX Whisper model running, which I use to generate subtitles from video. I thought that was all I needed. Then I asked Codex to build me a voice cloning tool in Python, and it went much deeper than I expected.

The tool ended up using:

Python
FFmpeg and FFprobe
MLX Audio text-to-speech
MLX Whisper transcription
Hugging Face model snapshots

A quick note: cloning someone’s voice can get you into trouble if they object or decide to sue. That said, Winston Churchill is no longer around to complain.

What the Tool Does

Lets me choose a reference audio or video file
Lets me choose a speech text file
Detects and skips leading silence (FFmpeg)
Extracts and cleans a reference voice sample (FFmpeg)
Transcribes the cleaned reference using an MLX Whisper model (whisper-large-v3-mlx)
Generates new speech from text using a local MLX Audio model (higgs-audio-v2-3B-mlx-q8)

The wild part is that all of this runs locally. No cloud service. No subscription API. Just my Mac, some open-source tools, and a Python script that suddenly feels a little too powerful.

What the Tool Does

Leave a Reply Cancel reply