PERBANDINGAN PENDEKATAN DIRECT SCORING DAN SIMILARITY-BASED SCORING DALAM SISTEM PENILAIAN JAWABAN SINGKAT OTOMATIS

    Bayu Wicaksono, - and Rasim, - and Yaya Wihardi, - (2025) PERBANDINGAN PENDEKATAN DIRECT SCORING DAN SIMILARITY-BASED SCORING DALAM SISTEM PENILAIAN JAWABAN SINGKAT OTOMATIS. S1 thesis, Universitas Pendidikan Indonesia.

    Abstract

    Dalam era pendidikan digital, kebutuhan akan sistem penilaian otomatis untuk jawaban teks pendek semakin meningkat. Automatic Short Answer Scoring (ASAS) bertujuan untuk mengotomasi proses penilaian ini dengan pendekatan yang efisien dan konsisten. Dua pendekatan yang umum digunakan dalam ASAS adalah direct scoring dan similarity-based scoring. Meskipun kedua pendekatan ini sudah banyak digunakan, penelitian sebelumnya cenderung fokus terhadap metrik seperti RMSE dan Pearson Correlation dalam menilai performa model. Penelitian ini bertujuan untuk melakukan analisis yang lebih mendalam dengan membandingkan kedua pendekatan tersebut pada dua skenario evaluasi, yaitu specific-prompt dan cross-prompt, dengan menilai akurasi dan stabilitas model. Dataset yang digunakan adalah dataset Rahutomo. Hasil analisis menunjukkan bahwa direct scoring lebih unggul dibandingkan similarity-based scoring. Pada skenario specific-prompt, diperoleh RMSE sebesar 0.0817 dan korelasi Pearson 0.9504, sedangkan pada cross-prompt, diperoleh RMSE sebesar 0.0917 dan korelasi Pearson 0.9286. Penelitian ini memberikan wawasan yang lebih komprehensif tentang performa model dengan tidak hanya mengandalkan metrik evaluasi, tetapi juga dengan melihat distribusi residual dan outlier, yang memberikan gambaran lebih lengkap mengenai stabilitas model. In the era of digital education, the need for automated scoring systems for short text answers has been steadily increasing. Automatic Short Answer Scoring (ASAS) aims to automate this assessment process with efficient and consistent approaches. Two commonly used approaches in ASAS are direct scoring and similarity-based scoring. Although these two approaches have been widely used, previous research has mostly focused on metrics like RMSE and Pearson Correlation to assess model performance. This study aims to provide a more in depth analysis by comparing both approaches in two evaluation scenarios, specific prompt and cross-prompt, by evaluating the accuracy and stability of the models. The dataset used in this study is the Rahutomo dataset. The results of the analysis show that direct scoring outperforms similarity-based scoring. In the specific prompt scenario, an RMSE of 0.0817 and a Pearson Correlation of 0.9504 were obtained, while in the cross-prompt scenario, the RMSE was 0.0917 and the Pearson Correlation was 0.9286. This study provides a more comprehensive insight into model performance by not only relying on evaluation metrics but also examining the distribution of residuals and outliers, which offers a more complete picture of model stability.

    [thumbnail of S_KOM_2106836_Title.pdf] Text
    S_KOM_2106836_Title.pdf

    Download (561kB)
    [thumbnail of S_KOM_2106836_Chapter1.pdf] Text
    S_KOM_2106836_Chapter1.pdf

    Download (54kB)
    [thumbnail of S_KOM_2106836_Chapter2.pdf] Text
    S_KOM_2106836_Chapter2.pdf
    Restricted to Staf Perpustakaan

    Download (827kB)
    [thumbnail of S_KOM_2106836_Chapter3.pdf] Text
    S_KOM_2106836_Chapter3.pdf

    Download (432kB)
    [thumbnail of S_KOM_2106836_Chapter4.pdf] Text
    S_KOM_2106836_Chapter4.pdf
    Restricted to Staf Perpustakaan

    Download (1MB)
    [thumbnail of S_KOM_2106836_Chapter5.pdf] Text
    S_KOM_2106836_Chapter5.pdf

    Download (38kB)
    Official URL: https://repository.upi.edu/
    Item Type: Thesis (S1)
    Additional Information: https://scholar.google.com/citations?user=T3C8yrsAAAAJ&hl=en ID SINTA Dosen Pembimbing: Rasim: 5990962 Yaya Wihardi: 5994413
    Uncontrolled Keywords: Automatic Short Answer Scoring, Cross-Prompt, Direct Scoring, Outlier, Similarity-Based Scoring, Specific-Prompt Automatic Short Answer Scoring; Cross-Prompt; Direct Scoring; Outlier; Similarity-Based Scoring; Specific-Prompt
    Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
    T Technology > T Technology (General)
    Divisions: Fakultas Pendidikan Matematika dan Ilmu Pengetahuan Alam > Program Studi Ilmu Komputer
    Depositing User: Bayu Wicaksono
    Date Deposited: 04 Aug 2025 09:43
    Last Modified: 04 Aug 2025 09:43
    URI: http://repository.upi.edu/id/eprint/135104

    Actions (login required)

    View Item View Item