• 대한전기학회
Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers
  • COPE
  • kcse
  • 한국과학기술단체총연합회
  • 한국학술지인용색인
  • Scopus
  • crossref
  • orcid

References

1 
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.URL
2 
K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2015.DOI
3 
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in CVPR, 2015.URL
4 
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.URL
5 
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16×16 words: Transformers for image recognition at scale,” in ICLR, 2021.DOI
6 
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” In Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012-10022, 2021.URL
7 
T. Yu, G. Zhao, P. Li, and Y. Yu, “BOAT: Bilateral local attention vision transformer,” arXiv preprint arXiv:2201.13027, 2022.DOI
8 
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” In 2009 IEEE conference on computer vision and pattern recognition, pp. 248-255, 2009.DOI
9 
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaised, I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, 2017.URL
10 
W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” In Proceedings of the IEEE/CVF international conference on computer vision, pp. 568-578, 2021.URL
11 
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” In International conference on machine learning, pp. 10347-10357, PMLR, 2021.URL
12 
J. Guo, K. Han, H. Wu, Y. Tang, X. Chen, Y. Wang, and C. Xu, “Cmt: Convolutional neural networks meet vision transformers,” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12175- 12185, 2022.URL
13 
J. Fang, H. Lin, X. Chen, and K. Zeng, “A hybrid network of cnn and transformer for lightweight image super-resolution,” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1103-1112, 2022.URL
14 
H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, “Cvt: Introducing convolutions to vision transformers,” In Proceedings of the IEEE/CVF international conference on computer vision, pp. 22-31, 2021.URL
15 
X. Dong, J. Bao, D. Chen, W. Zhang, N. Yu, L. Yuan, D. chen, and B. Guo, “Cswin transformer: A general vision transformer backbone with cross-shaped windows,” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12124-12134, 2022.URL
16 
S. Wu, T. Wu, H. Tan, and G. Guo, “Pale transformer: A general vision transformer backbone with pale-shaped attention,” In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2731-2739, 2022.DOI
17 
Q. Zhang, Y. Xu, J. Zhang, and D. Tao, “Vsa: Learning varied-size window attention in vision transformers,” In European conference on computer vision, Cham: Springer Nature Switzerland, pp. 466-483, 2022.DOI
18 
M. D. Zeiler, and R. Fergus, “Visualizing and understanding convolutional networks,” In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pp. 818-833, Springer International Publishing, 2014.DOI
19 
J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze- and-excitation networks,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141, 2018.URL
20 
Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “ECA-Net: Efficient channel attention for deep convolutional neural networks,” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11534-11542, 2020.URL
21 
Z. Meng, J. Ma, and X. Yuan, “End-to-end low cost compressive spectral imaging with spatial-spectral self-attention,” In European conference on computer vision, pp. 187-204, Cham: Springer International Publishing, 2020.DOI
22 
X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794-7803, 2018.URL
23 
S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” In Proceedings of the European conference on computer vision, ECCV, pp. 3-19, 2018.URL
24 
J. Park, S. Woo, J. Y. Lee, and I. S. Kweon, “Bam: Bottleneck attention module,” arXiv preprint arXiv:1807.06514, 2018.DOI
25 
X. Chu, Z. Tian, Y. Wang, B. Zhang, H. Ren, X. Wei, H. Xia, and C. Shen, “Twins: Revisiting the design of spatial attention in vision transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 9355-9366, 2021.URL
26 
S. d’Ascoli, H. Touvron, M. L. Leavitt, A. S. Morcos, G. Biroli, and L. Sagun, “Convit: Improving vision transformers with soft convolutional inductive biases,” In International Conference on Machine Learning, pp. 2286-2296, PMLR, 2021.URL
27 
Y. Xu, Q. Zhang, J. Zhang, and D. Tao, “Vitae: Vision transformer advanced by exploring intrinsic inductive bias,” Advances in Neural Information Processing Systems, vol. 34, pp. 28522-28535, 2021.URL
28 
K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang, “Transformer in transformer,” Advances in Neural Information Processing Systems, vol. 34, pp. 15908-15919, 2021.URL
29 
L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Y, Z. Jiang, F. E. H. Tay, J. Feng, and S. Yan, “Tokens-to-token vit: Training vision transformers from scratch on imagenet,” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 558-567, 2021.URL
30 
C. F. Chen, Q. Fan, and R. Panda, “Crossvit: Cross-attention multi-scale vision transformer for image classification,” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357-366, 2021.URL