A three-stage deep learning system which can figure out how to imitate a person's voice from as little as five seconds of recorded speech. Speaks with the deepfaked voice in realtime. The sample is used to condition an existing TTS model to sound like someone.
Running it inside a Docker container: https://sean.lane.sh/posts/2019/07/Running-the-Real-Time-Voice-Cloning-project-in-Docker/