| Title | 
	Design of Lightweight Fully-Connected Network in Hardware Using Learning-Based Low-Rank Approximation and Quantization Techniques  | 
					
	| Authors | 
	서정윤(Jeong-Yun Seo) ; 이종윤(Jong-Youn Lee) ; 박성준(Sung-Jun Park) ; 이하림(Harim Lee) | 
					
	| DOI | 
	https://doi.org/10.5370/KIEE.2025.74.1.149 | 
					
	| Keywords | 
	  Deep learning; Quantization; Low-rank approximation; Pruning; Verilog HDL | 
					
	| Abstract | 
	In this paper, we address the design of an AI hardware accelerator optimized for a lightweight fully-connected network. Techniques such as quantization, knowledge distillation, pruning, and low-rank approximation are utilized to reduce the number of weights, maintaining inference performance while minimizing memory requirements. We introduce a learning-based low-rank approximation that outperforms the original low rank approximation. In addition, the interrelationship between various compression techniques such as quantization, knowledge distillation, pruning, and low-rank approximation is analyzed to enhance the understanding of deep learning model compression. In order to use the decomposed weight matrices in hardware, we design a compressed fully-connected layer, utilized to construct a lightweight fully-connected network. The proposed hardware design is developed by using Verilog HDL and verified through RTL simulation.  |