Abstract
We present an unsupervised 3D deep learning framework
based on a ubiquitously true proposition named by us
view-object consistency as it states that a 3D object and
its projected 2D views always belong to the same object
class. To validate its effectiveness, we design a multi-view
CNN instantiating it for salient view selection and interest
point detection of 3D objects, which quintessentially cannot
be handled by supervised learning due to the difficulty
of collecting sufficient and consistent training data. Our unsupervised
multi-view CNN, namely UMVCNN, branches
off two channels which encode the knowledge within each
2D view and the 3D object respectively and also exploits
both intra-view and inter-view knowledge of the object. It
ends with a new loss layer which formulates the view-object
consistency by impelling the two channels to generate consistent
classification outcomes. The UMVCNN is then integrated
with a global distinction adjustment scheme to incorporate
global cues into salient view selection. We evaluate
our method for salient view section both qualitatively and quantitatively, demonstrating its superiority over several
state-of-the-art methods. In addition, we showcase that our
method can be used to select salient views of 3D scenes containing
multiple objects.We also develop a method based on
the UMVCNN for 3D interest point detection and conduct
comparative evaluations on a publicly available benchmark,
which shows that the UMVCNN is amenable to different 3D
shape understanding tasks.
based on a ubiquitously true proposition named by us
view-object consistency as it states that a 3D object and
its projected 2D views always belong to the same object
class. To validate its effectiveness, we design a multi-view
CNN instantiating it for salient view selection and interest
point detection of 3D objects, which quintessentially cannot
be handled by supervised learning due to the difficulty
of collecting sufficient and consistent training data. Our unsupervised
multi-view CNN, namely UMVCNN, branches
off two channels which encode the knowledge within each
2D view and the 3D object respectively and also exploits
both intra-view and inter-view knowledge of the object. It
ends with a new loss layer which formulates the view-object
consistency by impelling the two channels to generate consistent
classification outcomes. The UMVCNN is then integrated
with a global distinction adjustment scheme to incorporate
global cues into salient view selection. We evaluate
our method for salient view section both qualitatively and quantitatively, demonstrating its superiority over several
state-of-the-art methods. In addition, we showcase that our
method can be used to select salient views of 3D scenes containing
multiple objects.We also develop a method based on
the UMVCNN for 3D interest point detection and conduct
comparative evaluations on a publicly available benchmark,
which shows that the UMVCNN is amenable to different 3D
shape understanding tasks.
Original language | English |
---|---|
Journal | International Journal of Computer Vision |
Early online date | 16 Mar 2022 |
DOIs | |
Publication status | E-pub ahead of print - 16 Mar 2022 |
Keywords
- Unsupervised 3D deep learning
- Multi-view CNN
- View-object consistency
- View selection
- 3D interest point detection