Title |
Voice Biometric Authentication Using AI : A Comparative Study on Neural Network Robustness to Noise andSpoofing |
Authors |
(Oralbek Bayazov) ; (Anel Aidos) ; 강정원(Jeong Won Kang) ; (Assel Mukasheva) |
DOI |
https://doi.org/10.5370/KIEE.2025.74.10.1731 |
Keywords |
Voice biometrics; Wav2Vec 2.0; Spoof detection; LSTM; Federated learning; Multimodal authentication; Deep learning.; ㅍ |
Abstract |
Voice biometrics is emerging as a secure, intuitive, and contactless method of identity verification, offering key advantages over traditional PIN- or password-based systems. However, its effectiveness is often reduced by real-world factors such as background noise, device variability, and spoofing attacks including replay and synthetic voice input. This paper presents a comparative analysis of three neural network architectures-Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and transformer-based Wav2Vec 2.0-for voice biometric authentication under both clean and adverse conditions. Experiments were conducted using two large-scale datasets, Mozilla Common Voice and VoxCeleb, with audio represented as mel spectrograms, mel-frequency cepstral soefficients (MFCCs), and raw waveforms. Data augmentation included Gaussian noise, reverberation, background speech, and spoofing via text-to-speech (TTS) synthesis. Results show that Wav2Vec 2.0 consistently outperforms CNN and LSTM in terms of accuracy, robustness to noise, and partial resistance to spoofing, reaching up to 92% accuracy in clean scenarios. Despite these gains, none of the models proved fully resistant to high-fidelity synthetic voice attacks. To address this, we propose integrating explicit spoof detection modules and adversarial training techniques. Additionally, privacy-preserving frameworks such as federated learning and the use of multimodal biometrics are discussed as future directions for secure and ethical deployment. |