Mochamad Khaairi, - and Rasim, - and Yaya Wihardi, - (2025) PENGENALAN EKSPRESI WAJAH PESERTA DIDIK DI RUANG KELAS MENGGUNAKAN HYBRID MOBILENETV3-VIT DENGAN TOKEN DOWNSAMPLING. S1 thesis, Universitas Pendidikan Indonesia.
Abstract
Dalam lingkungan kelas besar, pengajar sering mengalami kesulitan dalam memantau secara menyeluruh ekspresi wajah setiap peserta didik selama proses pembelajaran. Padahal, ekspresi wajah mencerminkan kondisi emosional dan tingkat partisipasi peserta didik. Penelitian ini bertujuan membangun dan mengevaluasi sistem pengenalan ekspresi wajah yang tangguh pada kondisi nyata. Diusulkan model berbasis arsitektur hybrid yang menggabungkan MobileNetV3 untuk ekstraksi fitur lokal dan Vision Transformer untuk pemodelan konteks global, serta dilengkapi Token Downsampling guna mengurangi jumlah token yang diproses. Model dilatih pada set data FER-2013 dan mencapai akurasi 71.24%, lebih tinggi dari baseline 70.40%. Penggunaan Token Downsampling dapat mengurangi kompleksitas komputasi model hingga dua kali lipat. Evaluasi sistem end-to-end pada set data ruang kelas menunjukkan bahwa pendekatan ini berhasil mendeteksi hampir semua ekspresi wajah yang ada (recall mencapai 99.88% dari sudut pandang tengah). Meskipun presisi klasifikasi masih menjadi tantangan, strategi pelatihan gabungan terbukti mampu meningkatkan performa secara signifikan, menegaskan bahwa model ini adaptif untuk diterapkan di lingkungan pembelajaran nyata. In large classroom environments, teachers often face difficulties in thoroughly monitoring the facial expressions of each student during the learning process. However, facial expressions can reflect students’ emotional states and levels of participation. This study aims to develop and evaluate a robust facial expression recognition system under real-world conditions. A hybrid model is proposed, combining MobileNetV3 for local feature extraction and a Vision Transformer for global context modeling, enhanced with Token Downsampling to reduce the number of processed tokens. The model was trained on the FER-2013 dataset and achieved an accuracy of 71.24%, outperforming the baseline of 70.40%. Token Downsampling significantly reduced the model’s computational complexity by up to half. End-to-end system evaluation on a classroom dataset shows that this approach successfully detects nearly all existing facial expressions (recall reached 99.88% from a front viewpoint). Although classification precision remains a challenge, the combined training strategy proved to significantly improve performance, confirming that this model is adaptive for implementation in a real learning environment.
![]() |
Text
S_KOM_2106416_Title.pdf Download (535kB) |
![]() |
Text
S_KOM_2106416_Chapter1.pdf Download (268kB) |
![]() |
Text
S_KOM_2106416_Chapter2.pdf Restricted to Staf Perpustakaan Download (887kB) | Request a copy |
![]() |
Text
S_KOM_2106416_Chapter3.pdf Download (806kB) |
![]() |
Text
S_KOM_2106416_Chapter4.pdf Restricted to Staf Perpustakaan Download (2MB) | Request a copy |
![]() |
Text
S_KOM_2106416_Chapter5.pdf Download (241kB) |
Item Type: | Thesis (S1) |
---|---|
Additional Information: | https://scholar.google.com/citations?user=xzOMnYAAAAAJ&hl=en ID SINTA Dosen Pembimbing: Rasim: 5990962 Yaya Wihardi: 5994413 |
Uncontrolled Keywords: | Hybrid Vision Transformer, MobileNetV3, Pengenalan Ekspresi Wajah, Ruang Kelas, Token Downsampling Classroom, Facial Expression Recognition, Hybrid Vision Transformer, MobileNetV3, Token Downsampling |
Subjects: | L Education > L Education (General) Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Fakultas Pendidikan Matematika dan Ilmu Pengetahuan Alam > Program Studi Ilmu Komputer |
Depositing User: | Mochamad Khaairi |
Date Deposited: | 06 Sep 2025 08:45 |
Last Modified: | 06 Sep 2025 08:45 |
URI: | http://repository.upi.edu/id/eprint/137698 |
Actions (login required)
![]() |
View Item |