Classification of Tweets for Video Streaming Services’ Content Recommendation on Twitter

Kiki Ferawati, Sa'idah Zahrotul Jannah


Streaming services were popular platforms often visited by internet users. However, the abundance of content can be confusing for its users, prompting them to look for a recommendation from other people. Some of the users looked for content to enjoy with the help of Twitter. However, there were irrelevant tweets shown in the results, showing sentences not related at all to the content in the streaming services platform. This study addressed the classification of relevant and irrelevant tweets for streaming services’ content recommendation using random forests and the Convolutional Neural Network (CNN). The result showed that the CNN performed better in the test set with higher accuracy of 94% but slower in running time compared to the random forest. There were indeed distinctive characteristics between the two categories of the tweets. Finally, based on the resulting classification, users could identify the right words to use and avoid while searching on Twitter.

Keywords: text mining, streaming services, classification, random forest, CNN

Full Text:



We Are Social & Hootsuite, Indonesia Digital report 2020. Glob. Digit. Insights, p. 247, 2020, [Online]. Available:

Lidwina, A. Persaingan streaming video di Indonesia. 2020.

Lee, C. C., Nagpal, P., Ruane, S. G., and Lim, H. S. Factors affecting online streaming subscriptions. Commun. IIMA, vol. 16, no. 1, Jan. 2018.

Lee, H. K., Lee, H. J., Park, J., Choi, J., and Kim, J. B. A Study of Predict Sales Based on Random Forest Classification, Int. J. u-and e-Service. 10(7): 25–34, 2017.

Untawale T. M. and Choudhari, G. Implementation of sentiment classification of movie reviews by supervised machine learning approaches. Proceedings of the 3rd International Conference on Computing Methodologies and Communication, ICCMC 2019, 1197–1200. 2019.

Fithriasari, K., Jannah, S. Z., and Reyhana, Z. Deep Learning for Social Media Sentiment Analysis, Mat. MJIAM, vol. 36, no. 2, pp. 99–111, 2020.

Cambridge Dictionary, STREAMING | meaning in the Cambridge English Dictionary. (accessed Jan. 23, 2021).

Feldman, R. and Sanger, J. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. New York: Cambridge University Press, 2006.

Alessa, A. and Faezipour, M. Tweet Classification Using Sentiment Analysis Features and TF-IDF Weighting for Improved Flu Trend Detection. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2018.

Breiman, L., Random forests. Mach. Learn., 45(1): 5–32. 2001.

Kim Y. Convolutional Neural Networks for Sentence Classification. Accessed: Feb. 26, 2021. [Online]. Available:

Patil, S., Gune, A., and Nene, M. Convolutional neural networks for text categorization with latent semantic analysis. 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing, ICECDS 2017, pp. 499-503. 2018.

Witten, I. H., Frank, E., Hall, M. A., and Pal, C. J. Data Mining: Practical Machine Learning Tools and Techniques. 2016.

Bekkar, M., Djemaa, H. K., and Alitouche, T. A. Evaluation Measures for Models Assessment over Imbalanced Data Sets. J. Inf. Eng. Appl., 3(10): 27–38. 2013.

Sasaki, Y. The truth of the F-measure. Teach Tutor Mater, pp. 1–5. 2007.

Mueller, A., et al., amueller/word_cloud: WordCloud 1.5.0. 2018.

Koehrsen, W. Machine-Learning-Projects/random_forest_explained at master · WillKoehrsen/Machine-Learning-Projects · GitHub, Hyperparameter Tuning the Random Forest in Python, Jan. 10, 2018. (accessed Feb. 17, 2021).


  • There are currently no refbacks.