IMPLEMENTASI ALGORITMA NAIVE BAYES UNTUK FILTRASI SPAM KOMENTAR PADA YOUTUBE

    Faiz Jauhari Makarim Riza, - and Rangga Gelar Guntara, - and Mubammad Rizki Nugraha, - (2025) IMPLEMENTASI ALGORITMA NAIVE BAYES UNTUK FILTRASI SPAM KOMENTAR PADA YOUTUBE. S1 thesis, Universitas Pendidikan Indonesia.

    Abstract

    Perkembangan interaksi pengguna di platform YouTube turut memunculkan permasalahan baru, salah satunya adalah maraknya komentar spam yang mengandung promosi perjudian online. Komentar semacam berdampak negatif terhadap komunitas yang terdapat di kanal. Penelitian ini bertujuan untuk membangun sistem klasifikasi komentar spam menggunakan algoritma Naive Bayes. Proses pengembangan model mengikuti tahapan CRISP-DM, dimulai dari pengumpulan data komentar menggunakan YouTube API, yang kemudian dilanjutkan dengan preprocessing teks melalui tahapan Unicode normalization, case folding, tokenizing, stopword removal, filtering, dan labelling. Pembobotan kata dilakukan dengan metode TF-IDF untuk menghasilkan representasi fitur yang optimal. Evaluasi model dilakukan dengan teknik K-Fold Cross Validation dan analisis confusion matrix. Hasil evaluasi menunjukkan bahwa model mencapai performa yang tinggi, dengan akurasi sebesar 97,1%, precision 96,4%, recall 95,6%, dan f1-score 96%. Model yang dibangun kemudian diterapkan dalam bentuk aplikasi berbasis Command Line Interface (CLI) yang dapat digunakan oleh pemilik kanal untuk mendeteksi dan menghapus komentar spam secara langsung. Berdasarkan hasil pengujian, sistem menunjukkan efektivitas yang tinggi. Penelitian ini menunjukkan bahwa kombinasi preprocessing yang tepat dan pemilihan algoritma yang sesuai dapat menghasilkan sistem deteksi spam yang akurat. The growing interaction among users on the YouTube platform has led to new challenges, one of which is the increasing prevalence of spam comments promoting online gambling. These types of comments negatively impact the sense of community within YouTube channels. This study aims to develop a spam comment classification system using the Naive Bayes algorithm. The model development process follows the CRISP-DM framework, starting with data collection through the YouTube API, followed by a series of text preprocessing steps including Unicode normalization, case folding, tokenizing, stopword removal, filtering, and manual labelling. Word weighting is performed using the TF-IDF method to produce optimal feature representations. Model evaluation was carried out using the K-Fold Cross Validation technique and confusion matrix analysis. The results show that the model achieved strong performance, with an accuracy of 97.1%, precision of 96.4%, recall of 95.6%, and an F1-score of 96%. The trained model was then deployed as a Command Line Interface (CLI) application, allowing channel owners to directly detect and remove spam comments. Based on testing outcomes, the system demonstrated high effectiveness. This study highlights that combining proper preprocessing techniques with the appropriate classification algorithm can result in an accurate and practical spam detection system.

    [thumbnail of S_BIDI_2107090_Title.pdf] Text
    S_BIDI_2107090_Title.pdf

    Download (2MB)
    [thumbnail of S_BIDI_2107090_Chapter1.pdf] Text
    S_BIDI_2107090_Chapter1.pdf

    Download (354kB)
    [thumbnail of S_BIDI_2107090_Chapter2.pdf] Text
    S_BIDI_2107090_Chapter2.pdf
    Restricted to Staf Perpustakaan

    Download (955kB)
    [thumbnail of S_BIDI_2107090_Chapter3.pdf] Text
    S_BIDI_2107090_Chapter3.pdf

    Download (430kB)
    [thumbnail of S_BIDI_2107090_Chapter4.pdf] Text
    S_BIDI_2107090_Chapter4.pdf
    Restricted to Staf Perpustakaan

    Download (1MB)
    [thumbnail of S_BIDI_2107090_Chapter5.pdf] Text
    S_BIDI_2107090_Chapter5.pdf

    Download (216kB)
    [thumbnail of S_BIDI_2107090_Appendix.pdf] Text
    S_BIDI_2107090_Appendix.pdf
    Restricted to Staf Perpustakaan

    Download (535kB)
    Official URL: https://repository.upi.edu
    Item Type: Thesis (S1)
    Additional Information: https://scholar.google.com/citations?view_op=new_profile&hl=id ID SINTA Dosen Pembimbing: Rangga Gelar Guntara: 6738149 Muhammad Rizki Nugraha: 6770726
    Uncontrolled Keywords: YouTube, Komentar Spam, Naive Bayes, TF-IDF, Klasifikasi, CLI. YouTube, Spam Comments, Naive Bayes, TF-IDF, Classification, CLI.
    Subjects: Q Science > QA Mathematics > QA76 Computer software
    Divisions: UPI Kampus Tasikmalaya > S1 Bisnis Digital
    Depositing User: Faiz Jauhari Makarim Riza
    Date Deposited: 08 Sep 2025 04:10
    Last Modified: 08 Sep 2025 04:10
    URI: http://repository.upi.edu/id/eprint/135898

    Actions (login required)

    View Item View Item