KIEE - The Transactions of the Korean Institute of Electrical Engineers

Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers

QR CODE : The Transactions of the Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

ISO Journal TitleTrans. Korean. Inst. Elect. Eng.

SCImago Journal & Country Rank

Main Menu

Journal Search

XML PDF INFO REF


Title	Voice Biometric Authentication Using AI : A Comparative Study on Neural Network Robustness to Noise andSpoofing
Authors	(Oralbek Bayazov) ; (Anel Aidos) ; 강정원(Jeong Won Kang) ; (Assel Mukasheva)
DOI	https://doi.org/10.5370/KIEE.2025.74.10.1731
Page	pp.1731-1739
ISSN	1975-8359
Keywords	Voice biometrics; Wav2Vec 2.0; Spoof detection; LSTM; Federated learning; Multimodal authentication; Deep learning.; ㅍ
Abstract	Voice biometrics is emerging as a secure, intuitive, and contactless method of identity verification, offering key advantages over traditional PIN- or password-based systems. However, its effectiveness is often reduced by real-world factors such as background noise, device variability, and spoofing attacks including replay and synthetic voice input. This paper presents a comparative analysis of three neural network architectures-Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and transformer-based Wav2Vec 2.0-for voice biometric authentication under both clean and adverse conditions. Experiments were conducted using two large-scale datasets, Mozilla Common Voice and VoxCeleb, with audio represented as mel spectrograms, mel-frequency cepstral soefficients (MFCCs), and raw waveforms. Data augmentation included Gaussian noise, reverberation, background speech, and spoofing via text-to-speech (TTS) synthesis. Results show that Wav2Vec 2.0 consistently outperforms CNN and LSTM in terms of accuracy, robustness to noise, and partial resistance to spoofing, reaching up to 92% accuracy in clean scenarios. Despite these gains, none of the models proved fully resistant to high-fidelity synthetic voice attacks. To address this, we propose integrating explicit spoof detection modules and adversarial training techniques. Additionally, privacy-preserving frameworks such as federated learning and the use of multimodal biometrics are discussed as future directions for secure and ethical deployment.

© KIEE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.