Universitas Sebelas Maret Bidikmisi Applicant's Classification using C4.5 Algorithm

Muh. Safri Juliardi

Abstract

Bidikmisi scholarship is a scholarship for poor but outstanding students. Because of the amount of applicants,
there is a need to use an accurate method in the selection process of Bidikmisi scholarship,
especially in Universitas Sebelas Maret’s (UNS) environment. In this paper, C4.5 algorithm is proposed
as a method to help on Bidikmisi recipients selection process. The dataset which is used is
Bidikmisi applicants data from 2013 to 2015. The applicant’s data from 2013 and 2014 is used as
training data and the applicant’s data from 2015 is used as testing data. Furthermore,
oversampling and undersampling technique is used to address the class imbalance problem in
training data. Finally the accuracy for each decision trees are compared to see which sampling method
is better. The result of this study shows that the accuracy of the C4.5 algorithm decision tree with
the applicant’s data from 2015 as testing data is 79,80% and Area Under Curve (AUC) value 0.5539.
Meanwhile, to compare the sampling method, the best decision tree based on testing result is chosen.
Oversampling technique produce 82,69 % for precision, 91,22 % for recall, and 77,16 % for accuracy.
While undersampling technique produce 82,78 % for precision, 91,22 % for recall, and 77,27 % for accuracy.
Therefore it is concluded that undersampling technique gives a better accuracy than oversampling technique.

Keywords

Bidikmisi, C4.5 algorithm, Decision Tree, Oversampling, Undersampling

References

Republik Indonesia. (2015). Pedoman Penyelenggaraan Bantuan Bidikmisi Tahun 2015. Direktorat Jenderal Pembelajaran dan Kemahasiswaan Kementerian Riset Teknologi dan Pendidikan Tinggi

Universitas Sebelas Maret. (2012). Prosedur Mutu Nomor UN27.14.2.PM03 tentang Beasiswa Khusus (Bidikmisi). Universitas Sebelas Maret.

Han, J., Kamber, M., & Pei, J. (2012). Data Mining Concepts And Techniques 3rd Edition (3rd ed.). Morgan Kaufmann.

Sani, K., Winarno, W. W., & Fauziati, S. (2016). Analisis Perbandingan Algoritma Classification Untuk Authentication Uang Kertas (Studi Kasus: Banknote Authentication). Jurnal Informatika, 10(1), 1130–1139.

Özsoy, S., Gümüş, G., & KHALILOV, S. (2015). C4.5 Versus Other Decision Trees: A Review. Computer Engineering and Applications Journal, 4(3), 173–181

Universitas Sebelas Maret. (2009). Peraturan Rektor Universitas Sebelas Maret Nomor 149A / H27/KM/2009 tentang Beasiswa Universitas Sebelas Maret. Universitas Sebelas Maret.

Quinlan, J. R. (1993). C4.5 : Programs For Machine Learning. San Mateo, California: Morgan Kaufmann.

Badan Pusat Statistik. (2016). Indikator Kesejahteraan Rakyat 2016. Badan Pusat Statistik.

Weiss, G. M., & Provost, F. (2003). Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research, 19, 315–354.

Stubbs, M. (2016). Static ROC Chart. Diambil dari https://bl.ocks.org/micahstubbs/5c3b87ce4bc247340186

Refbacks

  • There are currently no refbacks.