KIEE - The Transactions of the Korean Institute of Electrical Engineers

1

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.

2

K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2015.

3

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in CVPR, 2015.

4

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.

5

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16×16 words: Transformers for image recognition at scale,” in ICLR, 2021.

6

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” In Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012-10022, 2021.

7

T. Yu, G. Zhao, P. Li, and Y. Yu, “BOAT: Bilateral local attention vision transformer,” arXiv preprint arXiv:2201.13027, 2022.

8

J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” In 2009 IEEE conference on computer vision and pattern recognition, pp. 248-255, 2009.

9

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaised, I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, 2017.

10

W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” In Proceedings of the IEEE/CVF international conference on computer vision, pp. 568-578, 2021.

11

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” In International conference on machine learning, pp. 10347-10357, PMLR, 2021.

12

J. Guo, K. Han, H. Wu, Y. Tang, X. Chen, Y. Wang, and C. Xu, “Cmt: Convolutional neural networks meet vision transformers,” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12175- 12185, 2022.

13

J. Fang, H. Lin, X. Chen, and K. Zeng, “A hybrid network of cnn and transformer for lightweight image super-resolution,” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1103-1112, 2022.

14

H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, “Cvt: Introducing convolutions to vision transformers,” In Proceedings of the IEEE/CVF international conference on computer vision, pp. 22-31, 2021.

15

X. Dong, J. Bao, D. Chen, W. Zhang, N. Yu, L. Yuan, D. chen, and B. Guo, “Cswin transformer: A general vision transformer backbone with cross-shaped windows,” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12124-12134, 2022.

16

S. Wu, T. Wu, H. Tan, and G. Guo, “Pale transformer: A general vision transformer backbone with pale-shaped attention,” In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2731-2739, 2022.

17

Q. Zhang, Y. Xu, J. Zhang, and D. Tao, “Vsa: Learning varied-size window attention in vision transformers,” In European conference on computer vision, Cham: Springer Nature Switzerland, pp. 466-483, 2022.

18

M. D. Zeiler, and R. Fergus, “Visualizing and understanding convolutional networks,” In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pp. 818-833, Springer International Publishing, 2014.

19

J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze- and-excitation networks,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141, 2018.

20

Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “ECA-Net: Efficient channel attention for deep convolutional neural networks,” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11534-11542, 2020.

21

Z. Meng, J. Ma, and X. Yuan, “End-to-end low cost compressive spectral imaging with spatial-spectral self-attention,” In European conference on computer vision, pp. 187-204, Cham: Springer International Publishing, 2020.

22

X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794-7803, 2018.

23

S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” In Proceedings of the European conference on computer vision, ECCV, pp. 3-19, 2018.

24

J. Park, S. Woo, J. Y. Lee, and I. S. Kweon, “Bam: Bottleneck attention module,” arXiv preprint arXiv:1807.06514, 2018.

25

X. Chu, Z. Tian, Y. Wang, B. Zhang, H. Ren, X. Wei, H. Xia, and C. Shen, “Twins: Revisiting the design of spatial attention in vision transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 9355-9366, 2021.

26

S. d’Ascoli, H. Touvron, M. L. Leavitt, A. S. Morcos, G. Biroli, and L. Sagun, “Convit: Improving vision transformers with soft convolutional inductive biases,” In International Conference on Machine Learning, pp. 2286-2296, PMLR, 2021.

27

Y. Xu, Q. Zhang, J. Zhang, and D. Tao, “Vitae: Vision transformer advanced by exploring intrinsic inductive bias,” Advances in Neural Information Processing Systems, vol. 34, pp. 28522-28535, 2021.

28

K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang, “Transformer in transformer,” Advances in Neural Information Processing Systems, vol. 34, pp. 15908-15919, 2021.

29

L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Y, Z. Jiang, F. E. H. Tay, J. Feng, and S. Yan, “Tokens-to-token vit: Training vision transformers from scratch on imagenet,” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 558-567, 2021.

30

C. F. Chen, Q. Fan, and R. Panda, “Crossvit: Cross-attention multi-scale vision transformer for image classification,” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357-366, 2021.

KIEEThe Transactions of
the Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

ISO Journal TitleTrans. Korean. Inst. Elect. Eng.

Journal Search

Journal XML

References

KIEEThe Transactions ofthe Korean Institute of Electrical Engineers