Learning Common Semantics via Optimal Transport for Contrastive Multi-view Clustering

Zhang Qian, Lin Zhang, Ran Song*, Runmin Cong, YONGHUAI LIU, Wei Zhang

*Corresponding author for this work

Research output: Contribution to journalArticle (journal)peer-review

87 Downloads (Pure)

Abstract

Multi-view clustering aims to learn discriminative
representations from multi-view data. Although existing methods
show impressive performance by leveraging contrastive learning
to tackle the representation gap between every two views,
they share the common limitation of not performing semantic
alignment from a global perspective, resulting in the undermining
of semantic patterns in multi-view data. This paper presents
CSOT, namely Common Semantics via Optimal Transport, to
boost contrastive multi-view clustering via semantic learning in
a common space that integrates all views. Through optimal
transport, the samples in multiple views are mapped to the
joint clusters which represent the multi-view semantic patterns
in the common space. With the semantic assignment derived
from the optimal transport plan, we design a semantic learning
module where the soft assignment vector works as a global
supervision to enforce the model to learn consistent semantics
among all views. Moreover, we propose a semantic-aware reweighting strategy to treat samples differently according to their
semantic significance, which improves the effectiveness of crossview contrastive representation learning. Extensive experimental
results demonstrate that CSOT achieves the state-of-the-art
clustering performance.
Original languageEnglish
JournalIEEE Transactions on Image Processing
Publication statusAccepted/In press - 24 Jul 2024

Keywords

  • Multi-view clustering
  • semantic alignment
  • optimal transport
  • contrastive learning

Research Centres

  • Centre for Intelligent Visual Computing Research
  • Data and Complex Systems Research Centre

Fingerprint

Dive into the research topics of 'Learning Common Semantics via Optimal Transport for Contrastive Multi-view Clustering'. Together they form a unique fingerprint.

Cite this