Online News Classification Using Multinomial Naive Bayes

Amelia Rahman, Wiranto Wiranto, Afrizal Doewes

Abstract

The huge availability of text in numerous forms is the valuable information resource that can be used for various purposes. One of the text mining methods to analyze text document is classification. Text classification is a process of grouping and categorizing a document based on the training models. This study aimed to categorize Indonesian news automatically using Multinomial Naive Bayes. To get more optimal result, feature selection process using Document Frequency Thresholding method and term weighting using Term Frequency-Inverse Document Frequency (TF-IDF) were applied. The experiment showed that Multinomial Naive Bayes with TF-IDF produced the highest average accuracy to 86,62 % while Multinomial Naive Bayes reached 86,28%, Multinomial Naive Bayes with DF-Thresholding-TFIDF to 86,15% and Multinomial Naive Bayes with DF-Thresholding to 85,98%. Feature selection with Document Frequency Thresholding is quite efficient to reduce the number of data dimension shown with the result of insignificant final accuracy from Multinomial Naive Bayes method.

Keywords

Classification;text mining; multinomial naive bayes; tfidf; df-thresholding

Full Text:

PDF

References

Y. Wibisono dan M. L. Khodra, “Clustering Berita Berbahasa Indonesia,” FMIPA UPI & STEI ITB, 2005.

Kominfo, Oktober 2014. [Online]. Available: https://kominfo.go.id/index.php/content/detail/4214/Kemkominfo+Selenggarakan+FGD+Monitoring+Informasi+Publik/0/berita_satker. [Diakses 16 Mei 2016].

R. Feldman dan J. Sanger, The Text Mining Handbook, Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2006.

A. Hamzah, “Klasifikasi Teks dengan Naive Bayes Classifier (NBC) untuk Pengelompokan Teks Berita dan Abstract Akademis,” dalam Prosiding Seminar Nasional Aplikasi Sains & Teknologi (SNAST) Periode III, Yogyakarta, 2012.

S. Sumpeno dan I. Destuardi, “Klasifikasi Emosi untuk Teks Bahasa Indonesia menggunakan Metode Naive Bayes,” Seminar Nasional Pascasarjana, 2009.

R. Nallaswamy, “A Study on Analysis of SMS Classification Using Document Frequency Threshold,” dalam I.J. Information Engineering and Electronic Business, 2012.

StatSoft, “Text Mining Introductory Overview,” 2016. [Online]. Available: http://www.statsoft.com/Textbook/Text-Mining. [Diakses 8 Mei 2016].

D. P. Langgeni, Z. A. Baizal dan Y. F. A.W., “Clustering Artikel Berita Berbahasa Indonesia Menggunakan Unsupervised Feature Selection,” dalam Seminar Nasional Informatika, Yogyakarta, 2010.

G. Miner, A. Fast, D. Delen, T. Hill, J. Elder dan B. Nisbet, Practical Text Mining and Statistical Analysis for Non-Structured Text Data Application, Oxford: Elsevier, 2012.

Refbacks

  • There are currently no refbacks.