Mohamed El Banani


I am broadly interested in computer vision, machine learning, and cognitive science. My goal is to understand how visual agents learn to represent their world with minimal supervision and easily generalize to novel objects and scenes.

I received my PhD from the University of Michigan where I was advised by Justin Johnson. During my PhD, I was fortunate to work with David Fouhey and John Laird at UM, Benjamin Graham at FAIR, and Varun Jampani at Google Research. I did my undergraduate studies at Georgia Tech where I got the chance to work with Maithilee Kunda and Jim Rehg on cognitive modeling, as well as Omer Inan and Todd Sulchek on biomedical devices.


Feb 2024 Our work on the 3D awareness of visual foundation models was accepted at CVPR 2024.
Jan 2024 I successfully defended my thesis!
Oct 2023 I gave a talk at the Stanford Vision and Learning Lab on cross-modal correspondence.
Sep 2023 I chatted in Arabic with Abdelrahman Mohamed about my research. (video on AI بالمصري)
May 2023 I am spending the summer at Google Research working with Varun Jampani.


  1. teaser_probe3d.png
    Probing the 3D Awareness of Visual Foundation Models
    In CVPR, 2024
    TL;DR: Visual foundation models can classify, delineate, and localize objects in 2D. We study how well these models represent the 3D world that images depict?
  2. lgssl_teaser.png
    Learning Visual Representations via Language-Guided Sampling
    Mohamed El BananiKaran Desai, and Justin Johnson
    In CVPR, 2023
    TL;DR: A picture is worth a thousand words, but a caption can describe a thousand images. We use language models to find image pairs with similar captions, and use them for stronger contrastive learning.
  3. syncmatch_teaser.png
    Self-Supervised Correspondence Estimation via Multiview Registration
    In WACV, 2023
    TL;DR: Self-supervised correspondence estimation struggles with wide-baseline images. We use multiview registration and SE(3) transformation synchronization to leverage long-term consistency in RGB-D video
  4. byoc_teaser.png
    Bootstrap your own correspondences
    Mohamed El Banani, and Justin Johnson
    In ICCV, 2021 (Oral)
    TL;DR: Good features get us accurate correspondence, accurate correspondence is good for feature learning. We leverage this to learn {visual, geometric} features via self-supervised point cloud registration.
  5. unsupervisedrr_teaser.png
    UnsupervisedR&R: Unsupervised Point Cloud Registration via Differentiable Rendering
    Mohamed El Banani, Luya Gao, and Justin Johnson
    In CVPR, 2021 (Oral)
    TL;DR: Can we learn point cloud registration from RGB-D video? We propose a register and render approach that learns via minimizing photometric and geometric losses between close-by frames.
  6. novelviewpoints_teaser.gif
    Novel Object Viewpoint Estimation through Reconstruction Alignment
    Mohamed El Banani, Jason J Corso, and David F Fouhey
    In CVPR, 2020
    TL;DR: Humans can not help but see 3D structure of novel objects, so aligning their viewpoints becomes very easy. We propose a reconstruct-and-align approach for novelobject viewpoint estimation.
  7. kohs_diagram.png
    A Computational Exploration of Problem-Solving Strategies and Gaze Behaviors on the Block Design Task.
    Maithilee KundaMohamed El Banani, and James M Rehg
    In CogSci, 2016
    TL;DR: We present a computational architecture to model problem-solving strategies on the block design task. We generate detailed behavioral predictions and analyze cross-strategy error patterns.
  8. preeclampsia.jpg
    A Pilot Study of a Modified Bathroom Scale To Monitor Cardiovascular Hemodynamic in Pregnancy
    Odayme Quesada, Mohamed El Banani, James Heller, Shire Beach, Mozziyar Etemadi, Shuvo Roy, Omer Inan, Juan Gonzalez, and Liviu Klein
    Journal of the American College of Cardiology, 2016
    TL;DR: We use ballistocardiogram measurements extracted from a modified bathroom scale to analyze maternal cardiovascular adaptation during pregnancy for low-cost detection of preeclampsia.
  9. microfluidic_diagram.jpg
    Three-dimensional particle tracking in microfluidic channel flow using in and out of focus diffraction
    Bushra Tasadduq, Gonghao Wang, Mohamed El Banani, Wenbin Mao, Wilbur Lam, Alexander Alexeev, and Todd Sulchek
    Flow Measurement and Instrumentation, 2015
    TL;DR: We use defocusing patterns to extract 3D particle motion trajectories in 2D bright field videos of microfluidic devices.