Explainable Machine Learning dalam Analisis Risiko Akademis Mahasiswa Fakultas Vokasi Institut Teknologi Sepuluh Nopember

Lovinki Fitra Ananda; Mukti Ratna Dewi; Mochammad Reza Habibi

doi:10.20961/ijai.v9i1.94691

Explainable Machine Learning dalam Analisis Risiko Akademis Mahasiswa Fakultas Vokasi Institut Teknologi Sepuluh Nopember

Lovinki Fitra Ananda, Mukti Ratna Dewi, Mochammad Reza Habibi

Abstract

Abstrak:

Mahasiswa dengan performa akademis yang buruk dan tingkat drop out yang relatif tinggi dapat memengaruhi akreditasi dan citra institusi pendidikan tinggi. Hal tersebut dapat diantisipasi dengan cara mengevaluasi kondisi akademik mahasiswa, khususnya pada mahasiswa yang menunjukkan penurunan performa akademis. Penelitian ini bertujuan memberikan informasi mengenai faktor-faktor yang memengaruhi risiko akademis mahasiswa menggunakan explainable machine learning. Persentase mahasiswa yang berisiko akademis hanya sebesar 7,3% sehingga kasus imbalance ini perlu ditangani menggunakan SMOTE untuk mengoptimalkan kinerja model klasifikasi. Model random forest pada data yang telah seimbang memiliki kemampuan prediksi dengan tingkat akurasi 96,4%, specificity mencapai 95%, dan nilai recall atau sensitivity sebesar 98%. Selanjutnya, SHAP diimplementasikan untuk mengetahui kontribusi masing-masing faktor terhadap potensi risiko akademik. Hasil dari SHAP menunjukkan bahwa skor TPKA kuantitatif, diikuti oleh jenis kelamin dan jalur masuk memiliki kontribusi paling tinggi terhadap risiko akademis mahasiswa

===========================================

Abstract:

Students with poor academic performance and relatively high dropout rates can affect the accreditation and image of higher education institutions. This can be anticipated by evaluating the academic conditions of students, especially those who show a decline in academic performance. This study uses explainable machine learning to provide information on the factors that influence students’ academic risk. The percentage of students at academic risk is only 7.3%, so this imbalance case needs to be handled using SMOTE to optimize the performance of the classification model. The random forest model on balanced data has a predictive ability with an accuracy level of 96.4%, specificity reaching 95%, and a recall or sensitivity value of 98%. Furthermore, SHAP is implemented to determine the contribution of each factor to the potential academic risk. The results of SHAP show that the three most significant contributing factors to students’ academic risk are the quantitative TPKA score, followed by gender and type of student admission.

Keywords

explainable machine learning, student performance, random forest, academic risk, SHAP

Full Text:

PDF

References

[1] R. H. Tambunan, “Analisis Prediksi Kelulusan Mahasiswa Tepat Waktu Berdasarkan Kinerja Akademis Mahasiswa Menggunakan Algoritma Naïve Bayes dengan Implementasi Data Mining Studi Kasus : Departemen Teknik Industri USU,” Universitas Sumatera Utara, 2020.

[2] F. Araque, C. Roldán, and A. Salguero, “Factors influencing university drop out rates,” Comput Educ, vol. 53, no. 3, pp. 563–574, 2009, doi: 10.1016/j.compedu.2009.03.013.

[3] S.-Y. Hwang, D.-J. Shin, J.-K. Oh, Y.-S. Lee, and J.-J. Kim, “A Regression Analysis of Factors Affecting Dropout of College Students,” The Journal of The Institute of Internet, Broadcasting and Communication (IIBC), vol. 20, no. 4, pp. 187–193, 2020, doi: 10.7236/JIIBC.2020.20.4.187.

[4] B. Bloodhart, M. M. Balgopal, A. M. A. Casper, L. B. Sample McMeeking, and E. V. Fischer, “Outperforming yet undervalued: Undergraduate women in STEM,” PLoS One, vol. 15, no. 6, pp. 1–13, 2020, doi: 10.1371/journal.pone.0234685.

[5] M. Tan and P. Shao, “Prediction of student dropout in E-learning program through the use of machine learning method,” International Journal of Emerging Technologies in Learning, vol. 10, no. 1, pp. 11–17, 2015, doi: 10.3991/ijet.v10i1.4189.

[6] H. Kaur, H. S. Pannu, and A. K. Malhi, “A systematic review on imbalanced data challenges in machine learning: Applications and solutions,” ACM Comput Surv, vol. 52, no. 4, 2019, doi: 10.1145/3343440.

[7] J. Brandt and E. Lanzén, “A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification,” Uppsala University, 2020.

[8] G. L. Pritalia, “Analisis Komparatif Algoritme Machine Learning dan Penanganan Imbalanced Data pada Klasifikasi Kualitas Air Layak Minum,” KONSTELASI: Konvergensi Teknologi dan Sistem Informasi, vol. 2, no. 1, pp. 43–55, 2022, doi: 10.24002/konstelasi.v2i1.5630.

[9] V. García, J. S. Sánchez, and R. A. Mollineda, “On the effectiveness of preprocessing methods when dealing with different levels of class imbalance,” Knowl Based Syst, vol. 25, no. 1, pp. 13–21, Feb. 2012, doi: 10.1016/j.knosys.2011.06.013.

[10] Prajwala, “A Comparative Study on Decision Tree and Random Forest Using R Tool,” Ijarcce, no. January 2015, pp. 196–199, 2015, doi: 10.17148/ijarcce.2015.4142.

[11] S. Yang, “Who will dropout from university? Academic risk prediction based on interpretable machine learning,” vol. 1, pp. 0–2, 2021, doi: 10.3969/j.issn.1673-4807.2012.01.018.

[12] S. M. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions Scott,” Nips, vol. 16, no. 3, pp. 426–430, 2012.

[13] G. L. Pritalia, “Analisis Komparatif Algoritme Machine Learning dan Penanganan Imbalanced Data pada Klasifikasi Kualitas Air Layak Minum,” KONSTELASI: Konvergensi Teknologi dan Sistem Informasi, vol. 2, no. 1, pp. 43–55, 2022, doi: 10.24002/konstelasi.v2i1.5630.

[14] H. Kaur, H. S. Pannu, and A. K. Malhi, “A systematic review on imbalanced data challenges in machine learning: Applications and solutions,” ACM Comput Surv, vol. 52, no. 4, 2019, doi: 10.1145/3343440.

[15] R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” 2020 11th International Conference on Information and Communication Systems, ICICS 2020, pp. 243–248, 2020, doi: 10.1109/ICICS49469.2020.239556.

[16] J. Brandt and E. Lanzén, “A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification,” Uppsala University, 2020.

[17] D. L. Pratiwi, “Penerapan Metode Combine Sampling Pada Klasifikasi Imbalanced Data Biner,” 2018.

[18] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Deep Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research 16, vol. 16, pp. 321–357, 2002.

[19] Prajwala, “A Comparative Study on Decision Tree and Random Forest Using R Tool,” Ijarcce, no. January 2015, pp. 196–199, 2015, doi: 10.17148/ijarcce.2015.4142.

[20] L. Breiman, “Random Forests,” 2001. [Online]. Available: https://link.springer.com/content/pdf/10.1023/a:1010933404324.pdf

[21] W. Zhu, N. Zeng, and N. Wang, “Sensitivity, specificity, accuracy, associated confidence interval and ROC analysis with practical SAS® implementations.,” Northeast SAS Users Group 2010: Health Care and Life Sciences, pp. 1–9, 2010.

[22] M. Berkowitz and E. Stern, “Which Cognitive Abilities Make the Difference? Predicting Academic Achievements in Advanced STEM Studies,” J Intell, vol. 6, no. 4, p. 48, Oct. 2018, doi: 10.3390/jintelligence6040048.

DOI: https://doi.org/10.20961/ijai.v9i1.94691

Refbacks

There are currently no refbacks.

Alamat

Jalan Ir. Sutami 36 A, Surakarta, 57126

(0271) 638959

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

Username
Password
Remember me