ANALISIS REGRESI LOGISTIK BINER MENGGUNAKAN METODE SPARSE-GROUP LASSO PADA DATA BERDIMENSI TINGGI: Studi Kasus Faktor-Faktor yang Memengaruhi Capaian Indeks Pembangunan Manusia Kota/Kabupaten di Jawa Barat Tahun 2022

Hastialisna Hurul Aeni Setiawan, - and Nar Herrhyanto, - and Lukman, - (2025) ANALISIS REGRESI LOGISTIK BINER MENGGUNAKAN METODE SPARSE-GROUP LASSO PADA DATA BERDIMENSI TINGGI: Studi Kasus Faktor-Faktor yang Memengaruhi Capaian Indeks Pembangunan Manusia Kota/Kabupaten di Jawa Barat Tahun 2022. S1 thesis, Universitas Pendidikan Indonesia.

Abstract

Data berdimensi tinggi, di mana jumlah variabel bebas lebih dari jumlah amatan, kerap menimbulkan permasalahan dalam pemodelan statistik, seperti multikolinearitas, khususnya saat menggunakan metode regresi biasa. Untuk mengatasi hal tersebut, digunakan metode regularisasi, salah satunya adalah Sparse-Group LASSO, yaitu pengembangan dari metode LASSO yang mengombinasikan pendekatan LASSO dan Group LASSO. Metode ini memungkinkan seleksi variabel secara individu maupun kelompok, dan sesuai diterapkan pada data berdimensi tinggi yang memiliki struktur kelompok antarvariabel. Penelitian ini bertujuan menerapkan regresi logistik biner dengan Sparse-Group LASSO pada data capaian IPM 27 kota/kabupaten di Provinsi Jawa Barat tahun 2022, guna mengidentifikasi faktor-faktor yang memengaruhi capaian IPM. Variabel bebas terdiri atas 60 faktor dalam 7 kelompok, meliputi pendidikan, kesehatan, ekonomi, lingkungan, kependudukan, penyandang masalah kesejahteraan sosial (PMKS), dan tata kelola pemerintahan. Model terbaik diperoleh berdasarkan nilai λ optimal dari k-fold cross-validation dengan α=0,5, yaitu sebesar 0,0059954. Model ini menghasilkan 23 variabel terpilih dalam 3 kelompok utama: pendidikan, lingkungan, dan PMKS. Variabel dari kelompok pendidikan mencakup angka partisipasi murni, harapan lama sekolah, jumlah guru, jumlah murid, rata-rata lama sekolah, angka melek huruf, dan indeks literasi. Dari kelompok lingkungan, terpilih antara lain luas wilayah, luas hutan, jumlah kecamatan, jumlah kelurahan, tingkat pelayanan sampah, indeks kualitas lingkungan hidup, sanitasi layak, dan akses terhadap sumber air minum layak. Pada kelompok PMKS, faktor seperti jumlah anak terlantar, anak jalanan, gelandangan, pengemis, korban NAPZA, lansia terlantar, korban kekerasan anak, dan jumlah perceraian, turut berpengaruh. Model yang dihasilkan cukup baik dengan akurasi klasifikasi sebesar 96,3% dan nilai AUC mencapai 0,994.

High-dimensional data, where the number of independent variables is more than the number of observations, often causes problems in statistical modeling, such as multicollinearity, especially when using ordinary regression methods. To overcome this, regularization methods are used, one of which is Sparse-Group LASSO, which is a development of the LASSO method that combines the LASSO and Group LASSO approaches. This method allows individual or group selection of variables, and is suitable for high-dimensional data that has a group structure between variables. This study aims to apply binary logistic regression with Sparse-Group LASSO on HDI achievement data of 27 cities/districts in West Java Province in 2022, to identify factors that affect HDI achievement. The independent variables consist of 60 factors in 7 groups, including education, health, economy, environment, population, people with social welfare problems (PMKS), and governance. The best model was obtained based on the optimal λ value from k-fold cross-validation with α=0.5, which was 0.0059954. The model produced 23 selected variables in 3 main groups: education, environment, and PMKS. Variables from the education group include net enrollment rate, expected years of schooling, number of teachers, number of students, average years of schooling, literacy rate, and literacy index. From the environment group, variables such as area, forest area, number of sub-districts, number of villages, level of waste services, environmental quality index, proper sanitation, and access to safe drinking water sources were selected. In the PMKS group, factors such as the number of abandoned children, street children, homeless people, beggars, drug victims, abandoned elderly people, victims of child abuse, and the number of divorces, were also influential. The resulting model is quite good with a classification accuracy of 96.3% and an AUC value of 0.994.

[thumbnail of S_MAT_2102839_Title.pdf] Text
S_MAT_2102839_Title.pdf

Download (376kB)
[thumbnail of S_MAT_2102839_Chapter1.pdf] Text
S_MAT_2102839_Chapter1.pdf

Download (120kB)
[thumbnail of S_MAT_2102839_Chapter2.pdf] Text
S_MAT_2102839_Chapter2.pdf
Restricted to Staf Perpustakaan

Download (363kB)
[thumbnail of S_MAT_2102839_Chapter3.pdf] Text
S_MAT_2102839_Chapter3.pdf

Download (286kB)
[thumbnail of S_MAT_2102839_Chapter4.pdf] Text
S_MAT_2102839_Chapter4.pdf
Restricted to Staf Perpustakaan

Download (1MB)
[thumbnail of S_MAT_2102839_Chapter5.pdf] Text
S_MAT_2102839_Chapter5.pdf

Download (115kB)
[thumbnail of S_MAT_2102839_Appendix.pdf] Text
S_MAT_2102839_Appendix.pdf
Restricted to Staf Perpustakaan

Download (517kB)
Official URL: https://repository.upi.edu/
Item Type: Thesis (S1)
Additional Information: ID SINTA Dosen Pembimbing : Nar Herrhyanto : - Lukman : 6675529
Uncontrolled Keywords: Data Berdimensi Tinggi, Regresi Logistik Biner, Sparse-Group LASSO, Multikolinearitas, Indeks Pembangunan Manusia. High-Dimensional Data, Binary Logistic Regression, Sparse-Group LASSO, Multicollinearity, Human Development Index.
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics
Divisions: Fakultas Pendidikan Matematika dan Ilmu Pengetahuan Alam > Program Studi Matematika - S1 > Program Studi Matematika (non kependidikan)
Depositing User: Hastialisna hurul aeni setiawan
Date Deposited: 05 May 2025 03:15
Last Modified: 05 May 2025 03:15
URI: http://repository.upi.edu/id/eprint/132902

Actions (login required)

View Item View Item