Rafharum Fatimah, - and Mahmudah Salwa Gianti, - and Muhammad Rizalul Wahid, - (2025) ANALISIS KOMPARATIF KINERJA MODEL NEURAL NETWORK RINGAN (CNN, LSTM, TRANSFORMER) UNTUK DETEKSI DEEPFAKE AUDIO PADA DATASET WAVEFAKE. S1 thesis, Universitas Pendidikan Indonesia.
Abstract
Kemunculan deepfake audio sebagai hasil manipulasi suara berbasis text-to-speech menimbulkan ancaman serius terhadap keamanan digital. Untuk itu, penelitian ini bertujuan membandingkan kinerja tiga model deep learning ringan, yaitu Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), dan Transformer, dalam mendeteksi deepfake audio menggunakan dataset WaveFake. Dataset diproses melalui tahap preprocessing dan diekstraksi menjadi tiga fitur utama yaitu MFCC, mel-spectrogram dan spectrogram. Setelah itu, digunakan sebagai masukan bagi setiap model dengan parameter pelatihan yang seragam. Hasil evaluasi menunjukkan CNN mencapai akurasi tertinggi sebesar 92,8%, diikuti LSTM dengan 87,8%, sementara Transformer memperoleh 84,9%. CNN unggul karena kemampuannya mengekstraksi pola lokal pada data audio, sedangkan Transformer masih membutuhkan optimasi lebih lanjut. LSTM relatif cukup optimal dalam menangani dimensi spektral yang kompleks namun tidak sebagus CNN. Penelitian ini menyimpulkan bahwa CNN merupakan arsitektur paling efektif untuk deteksi deepfake audio pada dataset WaveFake, serta berpotensi diterapkan pada sistem keamanan digital yang efisien. ----- The emergence of deepfake audio as a result of text-to-speech-based voice manipulation poses a serious threat to digital security. Therefore, this study aims to compare the performance of three lightweight deep learning models, namely Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Transformer, in detecting deepfake audio using the WaveFake dataset. The dataset was processed through a preprocessing stage and extracted into three main features: MFCC, Mel-spectrogram, and spectrogram. These were then used as input for each model with uniform training parameters. Evaluation results showed that CNN achieved the highest accuracy at 92.8%, followed by LSTM at 87.8%, while Transformer obtained 84.9%. CNN excels due to its ability to extract local patterns in audio data, while Transformer still require further optimization. LSTM is relatively optimal in handling complex spectral dimensions but not as effective as CNN. This study concludes that CNN is the most effective architecture for detecting deepfake audio on the WaveFake dataset and has the potential to be applied in efficient digital security systems.
![]() |
Text
S_MKB_2104428_Title.pdf Download (726kB) |
![]() |
Text
S_MKB_2104428_Chapter1.pdf Download (308kB) |
![]() |
Text
S_MKB_2104428_Chapter2.pdf Restricted to Staf Perpustakaan Download (684kB) |
![]() |
Text
S_MKB_2104428_Chapter3.pdf Download (604kB) |
![]() |
Text
S_MKB_2104428_Chapter4.pdf Restricted to Staf Perpustakaan Download (1MB) |
![]() |
Text
S_MKB_2104428_Chapter5.pdf Download (261kB) |
![]() |
Text
S_MKB_2104428_Appendix.pdf Restricted to Staf Perpustakaan Download (936kB) |
Item Type: | Thesis (S1) |
---|---|
Additional Information: | https://scholar.google.com/citations?hl=id&user=6O8dAqkAAAAJ ID SINTA PEMBIMBING Mahmudah Salwa Gianti: 6779018 Muhammad Rizalul Wahid: 6780434 |
Uncontrolled Keywords: | CNN, Deepfake, LSTM, SVM, Transformer CNN, Deepfake, LSTM, SVM, Transformer |
Subjects: | T Technology > T Technology (General) |
Divisions: | UPI Kampus Purwakarta > S1 Mekatronika dan Kecerdasan Buatan |
Depositing User: | Rafharum Fatimah |
Date Deposited: | 09 Sep 2025 07:53 |
Last Modified: | 09 Sep 2025 07:53 |
URI: | http://repository.upi.edu/id/eprint/138264 |
Actions (login required)
![]() |
View Item |