PERBANDINGAN METODE WORD2VEC DAN TF-IDF DENGAN SVM UNTUK KLASIFIKASI TEKS PADA MEDIA SOSIAL TWITTER (STUDI KASUS PEMILIHAN LEGISLATIF 2024)

Mujtahidul Haq Mahyunda, - (2023) PERBANDINGAN METODE WORD2VEC DAN TF-IDF DENGAN SVM UNTUK KLASIFIKASI TEKS PADA MEDIA SOSIAL TWITTER (STUDI KASUS PEMILIHAN LEGISLATIF 2024). S1 thesis, Universitas Pendidikan Indonesia.

[img] Text
S_RPL_1909410_Title.pdf

Download (309kB)
[img] Text
S_RPL_1909410_Chapter1.pdf

Download (128kB)
[img] Text
S_RPL_1909410_Chapter2.pdf
Restricted to Staf Perpustakaan

Download (447kB)
[img] Text
S_RPL_1909410_Chapter3.pdf

Download (142kB)
[img] Text
S_RPL_1909410_Chapter4.pdf
Restricted to Staf Perpustakaan

Download (475kB)
[img] Text
S_RPL_1909410_Chapter5.pdf

Download (49kB)
[img] Text
S_RPL_1909410_Appendix.pdf
Restricted to Staf Perpustakaan

Download (86kB)
Official URL: http://repository.upi.edu

Abstract

Media sosial Twitter merupakan platform terbanyak penyebaran isu berupa teks tentang pemilihan legislatif 2024 di Indonesia. Jumlah teks ada pada media sosial Twitter membutuhkan teknik klasifikasi teks yang tepat seperti TF-IDF (Term Frequency-Inverse Document Frequency) dan Word2vec, teknik klasifikasi teks tersebut dilakukan penelitian yang diperlukan terkait pemilihan legislatif 2024 bertujuan untuk mencari metode terbaik untuk klasifikasi teks dan melihat keunggulan dan kelemahan metode Word2vec dan TF-IDF dengan klasifikasi SVM. Metode penelitian menggunakan Design Science Research Methode (DSRM) untuk meningkatkan basis pengetahuan teknologi dan ilmu pengetahuan melalui penciptaan benda atau alat inovatif. Hasil evaluasi model TF-IDF pembagian data 90% Latih dan 10% Uji memiliki hasil tertinggi pada evaluasi Confusion matrix dengan accuracy 87%, precision 83%, recall 88% dan f1-score 85%. daripada model Word2vec. Penggunaan teknik K-fold cross validation model TF-IDF hasil tertinggi pada akurasi 86%, precision 80%, recall 87% dan f1-score 84% pada pembagian data 90% Latih dan 10% Uji, menunjukkan bahwa model TF-IDF memiliki kinerja yang cukup baik untuk menggeneralisasi data yang independen, sehingga menghindari bias dibandingkan model Word2vec. Penelitian juga menemukan kelemahan SVM adalah sulit digunakan dalam jumlah sampel berskala besar karena memiliki pengaruh kinerja, sehingga membuat kinerja tidak maksimal. Model TF-IDF menghasilkan vektor fitur yang lebih spesifik, dapat mengatasi masalah kata-kata yang jarang muncul dan dapat vektor fitur yang mudah diinterpretasikan. Penelitian ini diharapkan dapat membantu dalam menentukan metode klasifikasi teks yang tepat dan memberikan kontribusi bagi pengembangan teknik klasifikasi teks dalam pengolahan bahasa alami. ----------- Media sosial Twitter is the platform with the most spread of issues in the form of texts about the 2024 legislative election in Indonesia. The number of texts on Twitter requires the appropriate text classification techniques such as TF-IDF (Term Frequency-Inverse Document Frequency) and Word2vec. The text classification techniques are carried out in research related to the 2024 legislative election with the aim of finding the best method for text classification and seeing the advantages and disadvantages of the Word2vec and TF-IDF methods with SVM classification. The research method used is the Design Science Research Method (DSRM) to improve the knowledge base of technology and science through the creation of innovative objects or tools. The evaluation results of the TF-IDF model with 90% training data and 10% test data have the highest results in the Confusion matrix evaluation with an accuracy of 87%, precision of 83%, recall of 88%, and f1-score of 85%, compared to the Word2vec model. The use of the K-fold cross validation technique on the TF-IDF model resulted in the highest accuracy of 86%, precision of 80%, recall of 87%, and f1-score of 84% with 90% training data and 10% test data, indicating that the TF-IDF model has a good performance to generalize independent data, thus avoiding bias compared to the Word2vec model. The research also found that the weakness of SVM is that it is difficult to use in large-scale sample sizes because it affects performance, making it less than optimal. The TF-IDF model produces more specific feature vectors, can overcome problems with rare words, and can produce easily interpretable feature vectors. This research is expected to help determine the appropriate text classification method and contribute to the development of text classification techniques in natural language processing.

Item Type: Thesis (S1)
Additional Information: SINTA ID : 6658552 SINTA ID : 6681751
Uncontrolled Keywords: Twitter, Pemilihan legislatif 2024, Word2vec, TF-IDF, Support vector machine Twitter, 2024 Legislative Election, Word2vec, TF-IDF, Support Vector Machine.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Divisions: UPI Kampus cibiru > S1 Rekayasa Perangkaat Lunak
Depositing User: Mujtahidul Haq Mahyunda
Date Deposited: 06 Sep 2023 03:02
Last Modified: 06 Sep 2023 03:02
URI: http://repository.upi.edu/id/eprint/101033

Actions (login required)

View Item View Item