ONLINE NEWS CLASSIFICATION USING NAÏVE BAYES CLASSIFIER WITH MUTUAL INFORMATION FOR FEATURE SELECTION

Shafrian Adhi Karunia

Abstract

The number of online news documents can reach billion documents. Therefore, the grouping of news documents required to facilitate a editorial staff to input and categorize news by its categories.
This paper aim to classify online news using Naive Bayes Classifier with Mutual Information for feature selection that aims to determine the accuracy from combination of this methods in the classification of online news documents, so grouping of online news documents can be classified automatically and achieve more accurate for classification model. Data is divided into training and testing data. Data in August, September and October 2016 was used for training data. For testing data, 65 documents was used that located in November. The best results of this methods are 80% of accuracy, 94.28% of precision, 79.68% of recall and 85.08% of f-measure for Multivariate Bernoulli without feature selection. Then the best results of classification model using Mutual Information for feature selection achieved in Multivariate Bernoulli model with 70% of accuracy, 89.11% of precision, 69.76% of recall and 78.04% of f-measure with the word’s efficiency rate until 52% than before using feature selection. In other hand, the results of Multinomial Naïve Bayes without feature selection are 41.67% of accuracy, 75.68% of precision, 41.90% of recall and 48.13% of f-measure, for the results of Multinomial Naïve Bayes model using feature selection are 10% of accuracy, 33.33% of precision, 9.40% of recall and 14.35% of f-measure.

References

A. Arifin, R. Darwanto, D. A. Navastara and H. T. Ciptaningtyas, "Klasifikasi Online Dokumen Berita dengan Menggunakan Algoritma Suffix Tree Clustering," In Seminar Sistem Informasi Indonesia (SESINDO2008), December 2008.

A. Hamzah, "Klasifikasi Teks dengan Naive Bayes Classifier (NBC) untuk Pengelompokan Teks Berita dan Abstract Akademis," In Prosiding Seminar Nasional Apikasi Sains & Teknologi (SNAST) Periode III, pp. p. B269-B277, 2012.

J. Samodra, S. Sumpeno and M. Hariadi, "Klasifikasi Dokumen Teks Berbahasa Indonesia dengan Menggunakan Naive Bayes," In Seminar Nasional Electrical, Informatics, dan IT's Education, 2009.

C. Manning, R. P and S. H, Introduction to Information Retrieval, Cambridge University Press, 2009.

A. McCallum and K. Nigam, "A Comparison of Event Models for Naive Bayes Text Classification," In AAAI-98 workshop on learning for text categorization (Vol. 752, pp. 41-48), July 1998.

K.-M. Schneider, "A Comparison of Event Models for Naive Bayes Anti-Spam E-mail Filtering," In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics, vol. 1, pp. 307-314, April 2003.

J. G. Dimastyo and J. Adisantoso, "Pengukuran Kinerja Spam Filter dengan Feature Selection yang Berbeda Menggunakan Fungsi klasifikasi Multinomial Naïve Bayes," Makalah Kolokium Ekstensi, 1(1), 2014.

R. Imbar, Adelia, M. Ayub and A. Rehatta, "Implementasi Cosine Similarity dan Algoritma Smith-Waterman untuk Mendeteksi Kemiripan Teks," Jurnal Informatika, 10(1), 2014.

A. Sasmoyo, R. Saptono and Wiranto, "Penggunaan Jumlah Frekuensi Kata Terbanyak Sebagai Feature Set Pada Naive Bayes Classifier Untuk Mengklasifikasikan Dokumen Berbahasa Indonesia dan Inggris," Seminar Nasional Ilmu Komputer, 2015.

J. Han, M. Kamber and J. Pei, Data Mining: Concepts and Techniques 3rd Edition, Morgan Kaufmann, 2011.

M. Patahuddin, H. Sukoco and A. R. Akbar, "Klasifikasi Web Berdasarkan Domain dan Halaman Utama dengan Algoritme Multinomial Naive Bayes," In Makalah Seminar Ekstensi (Vol. 1), June 2016.

D. M. W. Powers, "Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation," Journal of Machine Learning Technologies, 37-63, 2011.

Refbacks

  • There are currently no refbacks.