Meutia Jasmine Annisa Herawan, - and Rani Megasari, - and Yaya Wihardi, - (2025) KLASIFIKASI AKSI PESERTA DIDIK DI DALAM RUANG KELAS MENGGUNAKAN VIDEO MASKED AUTOENCODER. S1 thesis, Universitas Pendidikan Indonesia.
Abstract
Dalam dunia pendidikan, evaluasi terhadap aksi peserta didik di ruang kelas merupakan indikator penting keberhasilan proses pembelajaran. Pendekatan berbasis deep learning menggunakan CNN telah banyak diterapkan untuk pengenalan aksi, tetapi CNN memiliki keterbatasan dalam menangkap hubungan spasial-temporal data. Saat ini, ViT menjadi metode baru yang menunjukkan performa unggul pada tugas pengenalan visual, termasuk pengenalan aksi manusia berbasis video. Penelitian ini bertujuan mengembangkan model klasifikasi aksi peserta didik di dalam kelas menggunakan arsitektur video transformer, yaitu VideoMAE, dan membandingkannya dengan TimeSformer sebagai model baseline melalui teknik fine-tuning. Karena belum tersedia dataset standar yang sesuai dengan konteks penelitian, penulis menyusun dataset khusus yang mencakup lima kelas aksi, yaitu mengangguk, mengangkat tangan, menggunakan ponsel, menopang kepala, dan menunduk, dengan total 403 klip video. Hasil penelitian menunjukkan bahwa VideoMAE mencapai akurasi terbaik sebesar 87.23% pada data uji, dengan parameter optimal berupa 15 epoch, batch size 4, learning rate 5e-5, weight decay 0.01, dan linear learning rate scheduler, melampaui TimeSformer yang memperoleh akurasi 79.80%. Temuan ini menunjukkan bahwa VideoMAE efektif untuk tugas klasifikasi aksi peserta didik di ruang kelas. Pendekatan ini diharapkan dapat menjadi solusi yang lebih objektif dan efisien untuk pengamatan aksi peserta didik, sekaligus menjadi rujukan pengembangan model video transformer pada domain pendidikan dengan dataset berukuran kecil. In the education world, evaluating students’ actions in the classroom is an important success indicator of the learning process. Deep learning approaches based on CNNs have been widely applied for action recognition. However, CNNs have limitations in capturing spatio-temporal relations within data. Recently, Vision Transformers (ViT) have emerged as a promising method, demonstrating superior performance on visual recognition tasks, including video-based human action recognition. This study aims to develop a model for classifying students’ actions in the classroom using a video transformer architecture, namely VideoMAE, and compare its performance with TimeSformer as a baseline model through fine-tuning techniques. Since no standard dataset suitable for the research context was available, a custom dataset was constructed comprising five action classes—nodding, raising a hand, using a phone, resting the head on the desk, and looking down, for a total of 403 video clips. The results show that VideoMAE achieved the highest accuracy of 87.23% on the test set, with optimal parameters of 15 epochs, batch size of 4, learning rate of 5e-5, weight decay of 0.01, and a linear learning rate scheduler, outperforming TimeSformer, which achieved 79.80% accuracy. These findings demonstrate that VideoMAE is effective for classroom action classification. This approach is expected to provide a more objective and efficient solution for observing students’ actions and serve as a reference for further research of video transformer models in the educational domain, particularly with small-scale datasets.
|
Text
S_KOM_200188_Title.pdf Download (480kB) |
|
|
Text
S_KOM_200188_Chapter1.pdf Download (277kB) |
|
|
Text
S_KOM_200188_Chapter2.pdf Restricted to Staf Perpustakaan Download (1MB) |
|
|
Text
S_KOM_200188_Chapter3.pdf Download (392kB) |
|
|
Text
S_KOM_200188_Chapter4.pdf Restricted to Staf Perpustakaan Download (1MB) |
|
|
Text
S_KOM_200188_Chapter5.pdf Download (232kB) |
| Item Type: | Thesis (S1) |
|---|---|
| Additional Information: | https://scholar.google.com/citations?hl=en&user=QPIpIxUAAAAJ&scilu=&scisig=ACUpqDcAAAAAaNIX2QnfKvPYvvofSiUKcvfetGA&gmla=AH8HC4y0BJCFkkxuTox9iLkP9QMOFCe38VBZvukiVzJVHh_pnISNVLWeLGNrGh_5bqA2FyKyJUOgaXAveIfUr1ZZZRdz0BQywXXZLEU&sciund=16512750378471354288 ID SINTA Dosen Pembimbing: Rani Megasari: 5992674 Yaya Wihardi: 5994413 |
| Uncontrolled Keywords: | Klasifikasi Aksi, Fine-tune, Aksi Peserta Didik, Pengenalan Aksi Manusia, Time-Series Transformers(TimeSformer), Video Masked Auto Encoder (VideoMAE) Action Classification, Fine-Tune, Student Action, Human Action Recognition, Time-Series Transformers(TimeSformer), Video Masked AutoEncoder (VideoMAE) |
| Subjects: | L Education > L Education (General) Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
| Divisions: | Fakultas Pendidikan Matematika dan Ilmu Pengetahuan Alam > Program Studi Ilmu Komputer |
| Depositing User: | Meutia Jasmine Annisa Herawan |
| Date Deposited: | 23 Sep 2025 06:24 |
| Last Modified: | 23 Sep 2025 06:24 |
| URI: | http://repository.upi.edu/id/eprint/140379 |
Actions (login required)
![]() |
View Item |
