eprintid: 137929 rev_number: 26 eprint_status: archive userid: 217569 dir: disk0/00/13/79/29 datestamp: 2025-09-26 04:00:27 lastmod: 2025-09-26 04:00:27 status_changed: 2025-09-26 04:00:27 type: thesis metadata_visibility: show creators_name: Hibar Taufikurachman, - creators_nim: NIM1907774 creators_id: hibartaufikurachman@upi.edu contributors_type: http://www.loc.gov/loc.terms/relators/THS contributors_type: http://www.loc.gov/loc.terms/relators/THS contributors_name: Mochamad Iqbal Ardimansyah, - contributors_name: Yulia Retnowati, - contributors_nidn: NIDN0428039101 contributors_nidn: NIDN0029079601 contributors_id: iqbalardimansyah@upi.edu contributors_id: yulia.retnowati@upi.edu title: PERBANDINGAN K-NEAREST NEIGHBORS, XGBOOST, SUPPORT VECTOR MACHINE, DAN RANDOM FOREST DALAM PREDIKSI DIABETES BERDASARKAN FAKTOR RISIKO ispublished: pub subjects: L1 subjects: QA75 subjects: QA76 subjects: T1 divisions: RPL_S1_TSK full_text_status: restricted keywords: Prediksi Diabetes, Pembelajaran Mesin, XGBoost, Random Forest, Faktor Risiko, Diabetes Prediction, Machine Learning, XGBoost, Random Forest, Risk Factors. note: https://scholar.google.com/citations?user=6rjxcpcAAAAJ&hl=en ID SINTA Dosen Pembimbing: Mochamad Iqbal Ardimansyah: 6658552 Yulia Retnowati: 6852573 abstract: Diabetes Mellitus merupakan keadaan darurat kesehatan global yang berkembang pesat, dengan jutaan kasus tidak terdiagnosis yang meningkatkan risiko komplikasi. Pembelajaran mesin dapat menjadi solusi untuk deteksi dini diabetes. Penelitian ini bertujuan untuk mengembangkan dan menganalisis perbedaan performa model pembelajaran mesin pada kasus deteksi diabetes dengan algoritma K-Nearest Neighbors (KNN), XGBoost, Support Vector Machine (SVM), dan Random Forest. Selain itu, juga untuk mengidentifikasi dan menganalisis faktor risiko diabetes yang paling berpengaruh. Penelitian ini menggunakan data dari National Health and Nutrition Examination Survey (NHANES) periode Agustus 2021-Agustus 2023, mencakup 21 fitur berdasarkan faktor risiko demografis, klinis, dan gaya hidup. Ketidakseimbangan kelas target ditangani dengan teknik SMOTE-ENN. Performa model dievaluasi menggunakan matriks accuracy, precision, recall, F1-score, specificity, dan ROC-AUC. Hasil penelitian menunjukkan model ensemble unggul signifikan dibandingkan dengan model lainnya. XGBoost mencapai kinerja terbaik setelah hyperparameter tuning pada matriks accuracy (0.9371), precision (0.7194), dan F1-score (0.7855). Sedangkan Random Forest menunjukkan performa terbaik pada recall (0.8773) dan ROC-AUC (0.9599). Identifikasi dan analisis pada hasil feature importance menghasilkan Glycohemoglobin (HbA1c) sebagai prediktor yang paling berpengaruh, diikuti oleh faktor risiko terkait kolesterol (kadar dan riwayat kolesterol), dan merokok. Faktor-faktor lainnya juga menunjukkan kontribusi seperti usia, ras, gangguan tidur, dan depresi. Hasil temuan ini menunjukkan bahwa model ensemble sangat efektif untuk prediksi diabetes, dengan XGBoost sebagai model dengan kinerja seimbang terbaik sedangkan Random Forest yang unggul pada matriks recall dan ROCAUC sangat cocok untuk skrining diabetes. Selain itu, faktor risiko utama yaitu Glycohemoglobin (HbA1c) sejalan dengan pemahaman klinis yang memvalidasi relevansi model.---------Diabetes Mellitus is a rapidly growing global health emergency, with millions of undiagnosed cases increasing the risk of complications. Machine learning can be a solution for early detection of diabetes. This study aims to develop and analyze the performance differences of machine learning models in diabetes detection using K-Nearest Neighbors (KNN), XGBoost, Support Vector Machine (SVM), and Random Forest algorithms. Additionally, it aims to identify and analyze the most influential diabetes risk factors. This study uses data from the National Health and Nutrition Examination Survey (NHANES) for the period of August 2021–August 2023, covering 21 features based on demographic, clinical, and lifestyle risk factors. The target class imbalance was addressed with the SMOTE-ENN technique. Model performance was evaluated using accuracy, precision, recall, F1-score, specificity, and ROC-AUC metrics. The results show that ensemble models significantly outperformed the other models. XGBoost achieved the best performance after hyperparameter tuning on accuracy (0.9371), precision (0.7194), and F1-score (0.7855) metrics. Meanwhile, Random Forest showed the best performance on recall (0.8773) and ROC-AUC (0.9599). Identification and analysis of feature importance results revealed Glycohemoglobin (HbA1c) as the most influential predictor, followed by cholesterol-related risk factors (level and history of cholesterol), and smoking. Other factors also showed contributions, such as age, race, sleep disorders, and depression. These findings indicate that ensemble models are highly effective for diabetes prediction, with XGBoost being the model with the best-balanced performance, while Random Forest, which excels in recall and ROC-AUC metrics, is highly suitable for diabetes screening. Furthermore, the primary risk factor, Glycohemoglobin (HbA1c), aligns with clinical understanding, which validates the model's relevance date: 2025-08-21 date_type: published institution: Universitas Pendidikan Indonesia department: KODEPRODI58201#Rekayasa_Perangkat_Lunak_S1 thesis_type: other thesis_name: other official_url: https://repository.upi.edu/ related_url_url: https://perpustakaan.upi.edu/ related_url_type: org citation: Hibar Taufikurachman, - (2025) PERBANDINGAN K-NEAREST NEIGHBORS, XGBOOST, SUPPORT VECTOR MACHINE, DAN RANDOM FOREST DALAM PREDIKSI DIABETES BERDASARKAN FAKTOR RISIKO. S1 thesis, Universitas Pendidikan Indonesia. document_url: http://repository.upi.edu/137929/1/S_RPL_1907774_Title.pdf document_url: http://repository.upi.edu/137929/2/S_RPL_1907774_Chapter1.pdf document_url: http://repository.upi.edu/137929/3/S_RPL_1907774_Chapter2.pdf document_url: http://repository.upi.edu/137929/4/S_RPL_1907774_Chapter3.pdf document_url: http://repository.upi.edu/137929/5/S_RPL_1907774_Chapter4.pdf document_url: http://repository.upi.edu/137929/6/S_RPL_1907774_Chapter5.pdf document_url: http://repository.upi.edu/137929/7/S_RPL_1907774_Appendix.pdf