eprintid: 138264
rev_number: 22
eprint_status: archive
userid: 218077
dir: disk0/00/13/82/64
datestamp: 2025-09-09 07:53:10
lastmod: 2025-09-09 07:53:10
status_changed: 2025-09-09 07:53:10
type: thesis
metadata_visibility: show
creators_name: Rafharum Fatimah, -
creators_name: Mahmudah Salwa Gianti, -
creators_name: Muhammad Rizalul Wahid, -
creators_nim: NIM2104428
creators_nim: NIDN0008049601
creators_nim: NIDN0001049402
creators_id: rafharumf@upi.edu
creators_id: msg.salwa@upi.edu
creators_id: rizalulwahid@upi.edu
contributors_type: http://www.loc.gov/loc.terms/relators/THS
contributors_type: http://www.loc.gov/loc.terms/relators/THS
contributors_name: Mahmudah Salwa Gianti, -
contributors_name: Muhammad Rizalul Wahid, -
contributors_nidn: NIDN0008049601
contributors_nidn: NIDN0001049402
contributors_id: msg.salwa@upi.edu
contributors_id: rizalulwahid@upi.edu
title: ANALISIS KOMPARATIF KINERJA MODEL NEURAL NETWORK RINGAN (CNN, LSTM, TRANSFORMER) UNTUK DETEKSI DEEPFAKE AUDIO PADA DATASET WAVEFAKE
ispublished: pub
subjects: T1
divisions: MKB_S1_PWT
full_text_status: restricted
keywords: CNN, Deepfake, LSTM, SVM, Transformer

CNN, Deepfake, LSTM, SVM, Transformer
note: https://scholar.google.com/citations?hl=id&user=6O8dAqkAAAAJ

ID SINTA PEMBIMBING
Mahmudah Salwa Gianti: 6779018
Muhammad Rizalul Wahid: 6780434
abstract: Kemunculan deepfake audio sebagai hasil manipulasi suara berbasis text-to-speech menimbulkan ancaman serius terhadap keamanan digital. Untuk itu, penelitian ini bertujuan membandingkan kinerja tiga model deep learning ringan, yaitu Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), dan Transformer, dalam mendeteksi deepfake audio menggunakan dataset WaveFake. Dataset diproses melalui tahap preprocessing dan diekstraksi menjadi tiga fitur utama yaitu MFCC, mel-spectrogram dan spectrogram. Setelah itu, digunakan sebagai masukan bagi setiap model dengan parameter pelatihan yang seragam. Hasil evaluasi menunjukkan CNN mencapai akurasi tertinggi sebesar 92,8%, diikuti LSTM  dengan 87,8%, sementara Transformer memperoleh 84,9%. CNN unggul karena kemampuannya mengekstraksi pola lokal pada data audio, sedangkan Transformer masih membutuhkan optimasi lebih lanjut. LSTM relatif cukup optimal dalam menangani dimensi spektral yang kompleks namun tidak sebagus CNN. Penelitian ini menyimpulkan bahwa CNN merupakan arsitektur paling efektif untuk deteksi deepfake audio pada dataset WaveFake, serta berpotensi diterapkan pada sistem keamanan digital yang efisien.
-----
The emergence of deepfake audio as a result of text-to-speech-based voice manipulation poses a serious threat to digital security. Therefore, this study aims to compare the performance of three lightweight deep learning models, namely Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Transformer, in detecting deepfake audio using the WaveFake dataset. The dataset was processed through a preprocessing stage and extracted into three main features: MFCC, Mel-spectrogram, and spectrogram. These were then used as input for each model with uniform training parameters. Evaluation results showed that CNN achieved the highest accuracy at 92.8%, followed by LSTM at 87.8%, while Transformer obtained 84.9%. CNN excels due to its ability to extract local patterns in audio data, while Transformer still require further optimization. LSTM is relatively optimal in handling complex spectral dimensions but not as effective as CNN. This study concludes that CNN is the most effective architecture for detecting deepfake audio on the WaveFake dataset and has the potential to be applied in efficient digital security systems.
date: 2025-08-28
date_type: published
institution: Universitas Pendidikan Indonesia
department: KODEPRODI21204#Mekatronika dan Kecerdasan Buatan Kampus Purwakarta_S1
thesis_type: other
thesis_name: other
official_url: https://repository.upi.edu/
related_url_url: https://perpustakaan.upi.edu/
related_url_type: org
citation:   Rafharum Fatimah, - and Mahmudah Salwa Gianti, - and Muhammad Rizalul Wahid, -  (2025) ANALISIS KOMPARATIF KINERJA MODEL NEURAL NETWORK RINGAN (CNN, LSTM, TRANSFORMER) UNTUK DETEKSI DEEPFAKE AUDIO PADA DATASET WAVEFAKE.  S1 thesis, Universitas Pendidikan Indonesia.   
document_url: http://repository.upi.edu/138264/1/S_MKB_2104428_Title.pdf
document_url: http://repository.upi.edu/138264/2/S_MKB_2104428_Chapter1.pdf
document_url: http://repository.upi.edu/138264/3/S_MKB_2104428_Chapter2.pdf
document_url: http://repository.upi.edu/138264/4/S_MKB_2104428_Chapter3.pdf
document_url: http://repository.upi.edu/138264/5/S_MKB_2104428_Chapter4.pdf
document_url: http://repository.upi.edu/138264/6/S_MKB_2104428_Chapter5.pdf
document_url: http://repository.upi.edu/138264/7/S_MKB_2104428_Appendix.pdf