Visual Dubbing Pipeline (Video & Face Synthesis)

A novel visual dubbing pipeline that balances lip-sync accuracy and realistic facial reenactment. Read the paper

Overview of the pipeline in inference time. More details can be found on the paper.

Sample results, more to be found here!

Overview

Combines person-generic and person-specific methods for realistic visual dubbing
Introduces a virtual dubber to capture expressive lip-sync with limited data
Uses a full-head identity-swapping autoencoder to transfer face, hair, ears, and neck
Eliminates artifacts like jitter and double chin from mouth-only approaches
Achieves high visual quality and temporal consistency with short video inputs