Video provides us with the spatio-temporal consistency needed for visual learning. Recent approaches have utilized this signal to learn correspondence estimation from close- by frame pairs. However, by only relying on close-by frame pairs, those approaches miss out on the richer long-range consistency between distant overlapping frames. To address this, we propose a self-supervised approach for correspon- dence estimation that learns from multiview consistency in short RGB-D video sequences. Our approach combines pairwise correspondence estimation and registration with a novel SE(3) transformation synchronization algorithm. Our key insight is that self-supervised multiview registration al- lows us to obtain correspondences over longer time frames; increasing both the diversity and difficulty of sampled pairs. We evaluate our approach on indoor scenes for correspon- dence estimation and RGB-D pointcloud registration and find that we perform on-par with supervised approaches.
Overview Video
Paper
El Banani, M., Rocco, I., Novotny, D., Vedaldi, A., Neverova, N., Johnson, J., Graham, B.
Self-supervised Correspondence Estimation via Multiview Registration
We thank Karan Desai, Mahmoud Azab, David Fouhey, Richard Higgins, Daniel Geng, and Menna El Banani for feedback and edits to early drafts of this work.
This webpage template was borrowed from some colorful folks.