An Attention-driven Hierarchical Multi-scale Representation for Visual Recognition

Zachary Wharton, ARDHENDU BEHERA*, ASISH BERA

*Corresponding author for this work

Research output: Contribution to journalConference proceeding article (ISSN)peer-review

Abstract

Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content. This is mainly due to their ability to break down an image into smaller pieces, extract multi-scale localized features and compose them to construct highly expressive representations for decision making. However, the convolution operation is unable to capture long-range dependencies such as arbitrary relations between pixels since it operates on a fixed-size window. Therefore, it may not be suitable for discriminating subtle changes (e.g. fine-grained visual recognition). To this end, our proposed method captures the high-level long-range dependencies by exploring Graph Convolutional Networks (GCNs), which aggregate information by establishing relationships among multi-scale hierarchical regions. These regions consist of smaller (closer look) to larger (far look), and the dependency between regions is modeled by an innovative attention-driven message propagation, guided by the graph structure to emphasize the neighborhoods of a given region. Our approach is simple yet extremely effective in solving both the fine-grained and generic visual classification problems. It outperforms the state-of-the-arts with a significant margin on three and is very competitive on other two datasets.
Original languageEnglish
JournalBritish Machine Vision Conference (BMVC)
Publication statusAccepted/In press - 15 Oct 2021
Event32nd British Machine Vision Conference - online
Duration: 22 Nov 202125 Nov 2021
https://www.bmvc2021-virtualconference.com/

Keywords

  • Computer Vision
  • Fine-grained visual recognition
  • Deep Learning
  • Graph Convolutional Networks
  • Message propagation
  • Graph clustering
  • Multi-headed attention
  • Convolutional Neural Network
  • Self-Attention
  • Hierarchical representation
  • Representation Learning

Research Centres

  • Data & Complex Systems Research Centre
  • Data Science STEM Research Centre

Research Groups

  • Visual Computing Lab

Fingerprint

Dive into the research topics of 'An Attention-driven Hierarchical Multi-scale Representation for Visual Recognition'. Together they form a unique fingerprint.

Cite this