Mosam Dabhi

I am a Ph.D. student at the Robotics Institute, Carnegie Mellon University (CMU) where I investigate uncovering the representations that could be used as implicit priors and act as a placeholder for human visual intelligence.

My goal is to understand how OOD reasoning (generalization) could be achieved to attain such general multi-modal intelligence in machines. To this end, I am currently focusing on extracting geometric reasoning via graph based representations to tackle this task.


News

  • Feb. 2024: 3D-LFM is accepted at CVPR, 2024. Looking forward to presenting this at Seattle.
  • June 2023: I will be joining Apple’s AI Research team as a research scientist intern.
  • Oct 2022: I am happy to announce that I won the NeurIPS 2022 scholar award for attending the in-person NeurIPS conference in New Orleans, Louisiana!
  • Oct 2022: MBW has been accepted to NeurIPS 2022.
  • Aug 2022: MV-NRSfM, our work from 3DV 2021 was featured on this post, titled “New AI tech to bring human-like understanding of our 3D world.”
  • May 2022: Joining Apple for third consecutive summer to advance the field of self-supervised and meta-learning.
  • Oct 2021: MV-NRSfM has been accepted to 3DV 2021. We have also released the code.
  • May 2021: Rejoining Apple this summer to work on deep learning driven computer vision.
  • May 2021: I have defended my Masters thesis on Multi-view NRSfM: Affordable Setup for High-Fidelity 3D Reconstruction.
  • May 2020: Joining Apple as a Machine Learning Research Scientist Intern.

Publications

3D-LFM: Lifting Foundation Model

3D-LFM: Lifting Foundation Model

CVPR, 2024

A universal 2D-3D lifting model, that processes diverse objects without category-specific knowledge. It uses transformers' permutation equivariance and geometric consistency to handle camera rotations, standardizing shape representation in a canonical frame.

MBW: Multi-view Bootstrapping in the Wild

MBW: Multi-view Bootstrapping in the Wild

NeurIPS, 2022

By enforcing temporal along with spatial consistencies via neural priors, MBW carries out Out-of-Distribution (OOD) detection for auto-labeling at scale in a low-shot learning fashion.

High Fidelity 3D Reconstructions with Limited Physical Views

High Fidelity 3D Reconstructions with Limited Physical Views

3DV, 2021

Enforcing multi-view equivariance with modern deep 3D lifting enables generation of high-fidelity 3D reconstructions using just 2-3 cameras, compared to >100 cameras.

Real-Time Information-Theoretic Exploration with Gaussian Mixture Model Maps

Real-Time Information-Theoretic Exploration with Gaussian Mixture Model Maps

RSS, 2019

Representing environment using Gaussian Mixture Models (GMMs) over voxel grids enables map transfer from Mars to Earth in 21 seconds compared to 1260 seconds.

Fast and agile vision‑based flight with teleoperation and collision avoidance on a multirotor

Fast and agile vision‑based flight with teleoperation and collision avoidance on a multirotor

ISER, 2018

Aggressive autonomous flight and collision-free teleoperation in unstructured, GPS-denied environments at speeds exceeding 12 m/s^2.

Aggressive Flight Performance using Robust Experience-driven Predictive Control Strategies: Experimentation and Analysis

Aggressive Flight Performance using Robust Experience-driven Predictive Control Strategies: Experimentation and Analysis

Robotics Institute, CMU (Technical Report), 2018

By storing crucial control policies, you can re-use them at a later stage without spending valuable compute resources on a resource-constrained Micro Air Vehicle (MAV).