KIEE - The Transactions of the Korean Institute of Electrical Engineers

Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers

QR CODE : The Transactions of the Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

ISO Journal TitleTrans. Korean. Inst. Elect. Eng.

SCImago Journal & Country Rank

Main Menu

Journal Search

XML PDF INFO REF


Title	Refined Feature-Space Window Attention Vision Transformer for Image Classification
Authors	유다연(Dayeon Yoo) ; 유진우(Jinwoo Yoo)
DOI	https://doi.org/10.5370/KIEE.2024.73.6.1004
Page	pp.1004-1011
ISSN	1975-8359
Keywords	Image Classification; Deep learning; Vision Transformer
Abstract	The window-based self-attention vision transformer (ViT) reduces computational complexity by computing attention within a specific window. However, it is difficult to capture the interactions between pixels from different windows. To address this issue, Swin transformer, a representative window-based self-attention ViT, introduces shifted window multi-head self-attention (SW-MSA) to capture the cross-window information. However, tokens that are distant from each other still cannot be grouped into one window. This paper proposes a method to cluster tokens based on similarity in the feature-space and compute attention within the cluster. The proposed method is an alternative to the SW-MSA of the existing Swin transformer. Additionally, this paper adopts a method to refine the feature space using convolutional block attention module (CBAM) to enhance the representational power of the model. In experimental results, the proposed network outperforms existing convolutional neural networks and transformer-based backbones in the classification task for ImageNet-1K.

© KIEE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.