We present an unsupervised 3D deep learning framework based on a ubiquitously true proposition named view-object consistency as it states that a 3D object and its projected 2D views always belong to the same object class. To validate its effectiveness, we design a multi-view CNN for the salient view selection of 3D objects, which quintessentially cannot be handled by supervised learning due to the difficulty of data collection. Our unsupervised multi-view CNN branches off two channels which encode the knowledge within each 2D view and the 3D object respectively and also exploits both intra-view and inter-view knowledge of the object. It ends with a new loss layer which formulates the view-object consistency by impelling the two channels to generate consistent classification outcomes. We experimentally demonstrate the superiority of our method over state-of-the-art methods and showcase that it can be used to select salient views of 3D scenes containing multiple objects.