Pengaruh Metode Seleksi Fitur terhadap Akurasi Model SVM dalam Klasifikasi Customer Churn pada Perusahaan Telekomunikasi

Mayke Andani Rohmaniar, Roni Habibi, Syafrial Fachri Pane

Abstract

Abstrak:

Penelitian ini menganalisis pengaruh metode seleksi fitur terhadap akurasi model Support Vector Machine dalam memprediksi pelanggan di industri telekomunikasi. Empat metode seleksi fitur (Correlation Matrix, PCA, dan GA) dan empat kernel (Linear, Polynomial, RBF, dan Sigmoid) dibandingkan menggunakan dataset pelanggan telekomunikasi dari Kaggle dengan 7043 entri dan 33 fitur. Metodologi CRISP-DM digunakan, meliputi Pemahaman Bisnis, Pemahaman Data, Persiapan Data, Pemodelan, Evaluasi, dan Implementasi. Hasil penelitian menunjukkan bahwa metode seleksi fitur menggunakan Correlation Matrix dengan kernel Linear memberikan kinerja terbaik. Model ini mencapai akurasi tertinggi sebesar 92,48%, dengan precision 0,93, recall 0,97, dan f1-score 0,95. Metode seleksi fitur lainnya, seperti PCA dan GA, memberikan hasil yang lebih rendah dibandingkan dengan Correlation Matrix. Implementasi model prediksi yang akurat diharapkan dapat membantu perusahaan telekomunikasi mengembangkan strategi retensi pelanggan yang lebih efektif.

=================================================

Abstract:

This study examines the impact of various feature selection methods on the accuracy of the Support Vector Machine (SVM) model in predicting customer behavior within the telecommunications sector. Specifically, the research compares four feature selection techniques: Correlation Matrix, Principal Component Analysis (PCA), and Genetic Algorithm (GA). Additionally, it evaluates the performance of four SVM kernels: Linear, Polynomial, Radial Basis Function (RBF), and Sigmoid. Utilizing a telecom customer dataset from Kaggle, which comprises 7043 entries and 33 features, the study adheres to the CRISP-DM methodology. This methodology includes phases such as Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Implementation. The findings indicate that the Correlation Matrix feature selection method, when paired with the Linear kernel, provides the best performance. This particular configuration achieves the highest accuracy rate of 92.48%, along with a precision score of 0.93, a recall score of 0.97, and an F1-score of 0.95. In contrast, other feature selection methods, such as PCA and GA, result in lower performance metrics. These findings underscore the effectiveness of the Correlation Matrix and Linear kernel combination in enhancing the predictive accuracy of SVM models.

Keywords

Customer Churn; Support Vector Machine; Pemilihan fitur; Correlation Matrix; ANOVA; PCA; Genetic Algorithm

Full Text:

PDF

References

[1] M. E. Meena and J. Geng, “Dynamic Competition in Telecommunications: A Systematic Literature Review,” SAGE Open, vol. 12, no. 2, 2022, doi: 10.1177/21582440221094609.

[2] Y. Khan, S. Shafiq, A. Naeem, S. Hussain, S. Ahmed, and N. Safwan, “Customers churn prediction using Artificial Neural Networks (ANN) in telecom industry,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 9, pp. 132–142, 2019, doi: 10.14569/ijacsa.2019.0100918.

[3] Q. Tang, G. Xia, and X. Zhang, “A hybrid classification model for churn prediction based on customer clustering,” J. Intell. Fuzzy Syst., vol. 39, no. 1, pp. 69–80, 2020, doi: 10.3233/JIFS-190677.

[4] R. Sharma, “Customer Churn Analysis in Telecom Industry using Logistics Regression in Machine Learning with Kaplan–Meier and Cox Proportional Hazards Model,” Interantional J. Sci. Res. Eng. Manag., vol. 08, no. 04, pp. 1–5, 2024, doi: 10.55041/ijsrem30745.

[5] O. J. Ogbonna, G. I. O. Aimufua, M. U. Abdullahi, and S. Abubakar, “Churn Prediction in Telecommunication Industry: A Comparative Analysis of Boosting Algorithms,” Dutse J. Pure Appl. Sci., vol. 10, no. 1b, pp. 331–349, 2024, doi: 10.4314/dujopas.v10i1b.33.

[6] S. R. K. S. P, “Customer Churn Prediction Using Ensemble Techniques on Telco Dataset,” vol. 10, no. 11, pp. 376–383, 2023, doi: 10.53555/kuey.v30i6.6126.

[7] P. D. A. N. Pencegahan, R. Hartini, and F. Azzahra, “OLIGOPOLI DAN PERSEKONGKOLAN OLEH KPPU ( STUDI KASUS PT . TELEKOMUMIKASI,” vol. 5, no. 1, pp. 98–107, 2024.

[8] R. I. Sujono et al., “Maintaining sustainable use of the Indonesian telecommunications provider,” J. Stud. Komun. (Indonesian J. Commun. Stud., vol. 8, no. 1, pp. 042–052, 2024, doi: 10.25139/jsk.v8i1.6246.

[9] R. Hooda, “Starlink: A Revolution in Global Satellite Internet Communication,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 11, no. 11, pp. 2622–2628, 2023, doi: 10.22214/ijraset.2023.57105.

[10] C. I. Samuels, N. R. Syambas, Hendrawan, I. J. M. Edward, Iskandar, and W. Shalannanda, “Service level measurement based on Uptime data monitoring for rural internet access services in Indonesia,” Proceeding 2017 11th Int. Conf. Telecommun. Syst. Serv. Appl. TSSA 2017, vol. 2019-Janua, pp. 1–5, 2019, doi: 10.1109/TSSA.2017.8272951.

[11] P. W. Nudan, P. Widodo, and M. Affifudin, “Navigating the Starlink Era of Personal Data Protection in Indonesia,” vol. 3, no. 7, pp. 1447–1458, 2024.

[12] M. Z. Alotaibi and M. A. Haq, “Customer Churn Prediction for Telecommunication Companies using Machine Learning and Ensemble Methods,” Eng. Technol. Appl. Sci. Res., vol. 14, no. 3, pp. 14572–14578, 2024, doi: 10.48084/etasr.7480.

[13] T. R. Noviandy, G. M. Idroes, I. Hardi, M. Afjal, and S. Ray, “A Model-Agnostic Interpretability Approach to Predicting Customer Churn in the Telecommunications Industry,” Infolitika J. Data Sci., vol. 2, no. 1, pp. 34–44, 2024, doi: 10.60084/ijds.v2i1.199.

[14] V. Chang, K. Hall, Q. Xu, F. Amao, M. Ganatra, and V. Benson, “Prediction of Customer Churn Behavior in the Telecommunication Industry Using Machine Learning Models,” Algorithms, vol. 17, no. 6, p. 231, 2024, doi: 10.3390/a17060231.

[15] V. Geetha, C. K. Gomathy, C. S. Ganesh, and S. Aravind, “The customer churn prediction using machine learning,” AIP Conf. Proc., vol. 3028, no. 1, pp. 614–619, 2024, doi: 10.1063/5.0212569.

[16] M. A. Al Rahib, N. Saha, R. Mia, and A. Sattar, “Customer data prediction and analysis in e-commerce using machine learning,” Bull. Electr. Eng. Informatics, vol. 13, no. 4, pp. 2624–2633, 2024, doi: 10.11591/eei.v13i4.6420.

[17] R. Guido, S. Ferrisi, D. Lofaro, and D. Conforti, “An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review,” Inf., vol. 15, no. 4, 2024, doi: 10.3390/info15040235.

[18] S. WANG, “Svm-Based Support Vector Type Recognition Machine for Smart Things in Soccer Training Motion Recognition,” Scalable Comput., vol. 25, no. 4, pp. 2519–2531, 2024, doi: 10.12694/scpe.v25i4.2923.

[19] A. Kar, N. Nath, U. Kemprai, and Aman, “Performance Analysis of Support Vector Machine (SVM) on Challenging Datasets for Forest Fire Detection,” Int. J. Commun. Netw. Syst. Sci., vol. 17, no. 02, pp. 11–29, 2024, doi: 10.4236/ijcns.2024.172002.

[20] C. Kaushik, A. D. McRae, M. A. Davenport, and V. Muthukumar, “New Equivalences Between Interpolation and SVMs: Kernels and Structured Features,” pp. 1–22, 2023, [Online]. Available: http://arxiv.org/abs/2305.02304

[21] P. Chen, F. Li, and C. Wu, “Research on Intrusion Detection Method Based on Pearson Correlation Coefficient Feature Selection Algorithm,” J. Phys. Conf. Ser., vol. 1757, no. 1, 2021, doi: 10.1088/1742-6596/1757/1/012054.

[22] R. Babatunde, S. O. Abdulsalam, O. A. Abdulsalam, and M. O. Arowolo, “Classification of customer churn prediction model for telecommunication industry using analysis of variance,” IAES Int. J. Artif. Intell., vol. 12, no. 3, pp. 1323–1329, 2023, doi: 10.11591/ijai.v12.i3.pp1323-1329.

[23] F. Song, Z. Guo, and D. Mei, “Feature selection using principal component analysis,” Proc. - 2010 Int. Conf. Syst. Sci. Eng. Des. Manuf. Informatiz. ICSEM 2010, vol. 1, pp. 27–30, 2019, doi: 10.1109/ICSEM.2010.14.

[24] L. Huang, X. Zhao, and K. Huang, “Globaltrack: A simple and strong baseline for long-term tracking,” AAAI 2020 - 34th AAAI Conf. Artif. Intell., pp. 11037–11044, 2020, doi: 10.1609/aaai.v34i07.6758.

[25] S. Rabbani, D. Safitri, N. Rahmadhani, A. A. F. Sani, and M. K. Anam, “Perbandingan Evaluasi Kernel SVM untuk Klasifikasi Sentimen dalam Analisis Kenaikan Harga BBM,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 3, no. 2, pp. 153–160, 2023, doi: 10.57152/malcom.v3i2.897.

[26] S. Amri, “Perbandingan kerangka model klasifikasi untuk pemilihan metode kontrasepsi dengan pendekatan CRIPS-DM,” Inf. Sci. Libr., vol. 1, no. 1, pp. 14–23, 2020.

[27] T. A. R. Akbar and C. Apriono, “Machine Learning Predictive Models Analysis on Telecommunications Service Churn Rate,” Green Intell. Syst. Appl., vol. 3, no. 1, pp. 22–34, 2023, doi: 10.53623/gisa.v3i1.249.

[28] K. N. R. Srinivas, K. S. S. Manikanta, T. Prem Jacob, G. Nagarajan, and A. Pravin, “Customer Stress Prediction in Telecom Industries Using Machine Learning,” Lect. Notes Electr. Eng., vol. 691, no. 4, pp. 491–498, 2021, doi: 10.1007/978-981-15-7511-2_48.

[29] A. M. Rahmani et al., “Machine learning (Ml) in medicine: Review, applications, and challenges,” Mathematics, vol. 9, no. 22, pp. 1–52, 2021, doi: 10.3390/math9222970.

[30] A. Kumar, A. Kaur, P. Singh, M. Driss, and W. Boulila, “Efficient Multiclass Classification Using Feature Selection in High-Dimensional Datasets,” Electron., vol. 12, no. 10, 2023, doi: 10.3390/electronics12102290.

[31] X. Xu et al., “Spectral preprocessing combined with feature selection improve model robustness for plastics samples classification by LIBS,” Front. Environ. Sci., vol. 11, no. May, pp. 1–13, 2023, doi: 10.3389/fenvs.2023.1175392.

[32] I. M. Nasir et al., “Pearson correlation-based feature selection for document classification using balanced training,” Sensors (Switzerland), vol. 20, no. 23, pp. 1–18, 2020, doi: 10.3390/s20236793.

[33] C. L. Huang and C. J. Wang, “A GA-based feature selection and parameters optimizationfor support vector machines,” Expert Syst. Appl., vol. 31, no. 2, pp. 231–240, 2019, doi: 10.1016/j.eswa.2005.09.024.

[34] Y. Lu, I. Cohen, X. S. Zhou, and Q. Tian, “Feature selection using principal feature analysis,” Proc. ACM Int. Multimed. Conf. Exhib., pp. 301–304, 2007, doi: 10.1145/1291233.1291297.

[35] N. Nurzilla, “Prediksi Pertumbuhan Tumor Kanker Payudara Menggunakan Model Regresi Linear Berbasis Machine Learning,” J. Artif. Intell. Appl., vol. 1, no. 1, pp. 28–35, 2024.

[36] A. Febrisa Sidabutar, R. Habibi, and W. Isti Rahayu, “Perbandingan Metode Klasifikasi Untuk Pengelompokan Risiko Magang Mahasiswa,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 3, pp. 2071–2076, 2023, doi: 10.36040/jati.v7i3.7026.

[37] H. Shamsudin, U. K. Yusof, Y. Haijie, and I. S. Isa, “an Optimized Support Vector Machine With Genetic Algorithm for Imbalanced Data Classification,” J. Teknol., vol. 85, no. 4, pp. 67–74, 2023, doi: 10.11113/jurnalteknologi.v85.19695.

Refbacks

  • There are currently no refbacks.