KLASIFIKASI DAN ANALISIS SEMANTIK CYBERBULLYING SOSIAL MEDIA X: Integrasi Web Scraping dan Natural Language Processing (NLP)

    Syifa Aulia Azzahra, - and Nuur Wachid Abdul Majid, - (2025) KLASIFIKASI DAN ANALISIS SEMANTIK CYBERBULLYING SOSIAL MEDIA X: Integrasi Web Scraping dan Natural Language Processing (NLP). S1 thesis, Universitas Pendidikan Indonesia.

    Abstract

    Cyberbullying di media sosial, khususnya X, telah menjadi isu kritis dengan dampak psikologis yang signifikan. Studi ini menganalisis sejauh mana cyberbullying masih terjadi di platform X dengan pendekatan semantik. Data dikumpulkan melalui proses web scraping menggunakan Selenium dengan menggunakan kategori dan kata kunci spesifik seperti “gendut” dan “bodoh” selama periode Desember 2024. Sebanyak 700 data berhasil dikumpulkan setelah melalui proses deduplikasi, yang mana memenuhi kriteria Slovin (margin of error 3.77%). Proses analisis melibatkan Natural Language Processing (NLP), termasuk text-cleaning, lowercasing, normalization, tokenization, stopword removal, klasifikasi model menggunakan model BERT yang telah di-fine-tune untuk memastikan program mengenali sebuah komentar termasuk cyberbullying atau tidak, serta pemetaan kata kunci ke 8 kategori, seperti “rasisme” dan “sara”. Hasil menunjukkan bahwa sebanyak 55,4% mengandung indikasi cyberbullying, dengan kategori seksual sebagai yang paling dominan dengan 26,6%, serta kata kunci anjing yang disebut 99 kali. Kata-kata negatif tertentu menunjukkan pola temporal yang fluktuatif, di mana intensitas cyberbullying mencapai puncak pada Minggu 4 dengan persentase tertinggi (79,1%). Temuan ini mengonfirmasi bahwa cyberbullying masih menjadi fenomena signifikan di platform X, oleh karena itu diperlukan kebijakan moderasi konten yang lebih ketat serta pengembangan sistem deteksi otomatis berbasis machine learning untuk mitigasi cyberbullying secara lebih efektif. ----- Cyberbullying on social media, particularly on platform X, has become a critical issue with significant psychological impacts. This study analyzes the extent to which cyberbullying persists on platform X using a semantic approach. Data was collected through web scraping with Selenium, employing specific categories and keywords such as "gendut" (overweight) and "bodoh" (stupid) during December 2024. A total of 700 data points were obtained after deduplication, meeting Slovin’s formula criteria with a 3.77% margin of error. The analysis involved Natural Language Processing (NLP) techniques, including text cleaning, lowercasing, normalization, tokenization, stopword removal, and classification using a fine-tuned BERT model to identify cyberbullying comments. Keywords were mapped to 8 predefined categories, such as "racism" and "ethnicity, religion, race, and intergroup relations (SARA)." Results revealed that 55.4% of the data contained cyberbullying indicators, with the "sexual" category being the most dominant at 26.6%. The keyword "anjing" (dog) appeared 99 times, and certain negative terms exhibited fluctuating temporal patterns, peaking in Week 4 with the highest cyberbullying intensity (79.1%). These findings confirm that cyberbullying remains a significant phenomenon on platform X. Consequently, stricter content moderation policies and the development of automated machine learning-based detection systems are recommended to mitigate cyberbullying more effectively.

    [thumbnail of TA_ART_S_PSTI_2100282_SK.pdf] Text
    TA_ART_S_PSTI_2100282_SK.pdf

    Download (266kB)
    [thumbnail of TA_ART_S_PSTI_2100282_ART.pdf] Text
    TA_ART_S_PSTI_2100282_ART.pdf
    Restricted to Staf Perpustakaan

    Download (771kB)
    Official URL: https://ejournal.unma.ac.id/index.php/educatio/art...
    Item Type: Thesis (S1)
    Additional Information: https://scholar.google.com/citations?hl=en&user=arSOUaIAAAAJ ID SINTA Pembimbing: Nuur Wachid Abdul Majid: 6054692 Karya ini adalah tugas akhir setara dengan skripsi sesuai dengan SK Direktur Kampus Universitas Pendidikan Indonesia di Purwakarta Nomor: 321/UN40.C4/TD.06/2025
    Uncontrolled Keywords: Cyberbullying, Analisis Semantik, Web Scraping, NLP, Sosial Media X. Cyberbullying, Analyze Semantic, Web Scraping, NLP, Media Social X.
    Subjects: T Technology > T Technology (General)
    Divisions: UPI Kampus Purwakarta > S1 Pendidikan Sistem Teknologi dan Informasi
    Depositing User: Syifa Aulia Azzahra
    Date Deposited: 25 Aug 2025 07:40
    Last Modified: 25 Aug 2025 07:40
    URI: http://repository.upi.edu/id/eprint/136268

    Actions (login required)

    View Item View Item