• 대한전기학회
Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers
  • COPE
  • kcse
  • 한국과학기술단체총연합회
  • 한국학술지인용색인
  • Scopus
  • crossref
  • orcid
Title Refined Feature-Space Window Attention Vision Transformer for Image Classification
Authors 유다연(Dayeon Yoo) ; 유진우(Jinwoo Yoo)
DOI https://doi.org/10.5370/KIEE.2024.73.6.1004
Page pp.1004-1011
ISSN 1975-8359
Keywords Image Classification; Deep learning; Vision Transformer
Abstract The window-based self-attention vision transformer (ViT) reduces computational complexity by computing attention within a specific window. However, it is difficult to capture the interactions between pixels from different windows. To address this issue, Swin transformer, a representative window-based self-attention ViT, introduces shifted window multi-head self-attention (SW-MSA) to capture the cross-window information. However, tokens that are distant from each other still cannot be grouped into one window.
This paper proposes a method to cluster tokens based on similarity in the feature-space and compute attention within the cluster.
The proposed method is an alternative to the SW-MSA of the existing Swin transformer. Additionally, this paper adopts a method to refine the feature space using convolutional block attention module (CBAM) to enhance the representational power of the model.
In experimental results, the proposed network outperforms existing convolutional neural networks and transformer-based backbones in the classification task for ImageNet-1K.