Unsupervised Learning of Monocular Depth and Ego-Motion using Conditional PatchGANs

Madhu Babu Vankadari, SWAGAT KUMAR, Anima Majumder, Kaushik Das

Research output: Contribution to conferencePaperpeer-review

10 Citations (Scopus)


This paper presents a new GAN-based deep learning framework for estimating absolute scale aware depth and ego motion from monocular images using a completely unsupervised mode of learning. The proposed architecture uses two separate generators to learn the distribution of depth and pose
data for a given input image sequence. The depth and pose data, thus generated, are then evaluated by
a patch-based discriminator using the reconstructed
image and its corresponding actual image. The
patch-based GAN (or PatchGAN) is shown to detect high frequency local structural defects in the reconstructed image, thereby improving the accuracy
of overall depth and pose estimation. Unlike conventional GANs, the proposed architecture uses a
conditioned version of input and output of the generator for training the whole network. The resulting
framework is shown to outperform all existing deep
networks in this field, beating the current state-of-the-art method by 8.7% in absolute error and 5.2% in RMSE metric. To the best of our knowledge,
this is first deep network based model to estimate
both depth and pose simultaneously using a conditional patch-based GAN paradigm. The efficacy of
the proposed approach is demonstrated through rigorous ablation studies and exhaustive performance
comparison on the popular KITTI outdoor driving
Original languageEnglish
Number of pages5684
Publication statusPublished - 16 Aug 2019
EventInternational Joint Conference on Artificial Intelligence - Macao, Macao, China
Duration: 10 Aug 201916 Aug 2019


ConferenceInternational Joint Conference on Artificial Intelligence
Abbreviated titleIJCAI
Internet address


  • Deep learning, Depth Estimation from Images, GANs


Dive into the research topics of 'Unsupervised Learning of Monocular Depth and Ego-Motion using Conditional PatchGANs'. Together they form a unique fingerprint.

Cite this