An Extreme Gradient Boosting Approach for Classification and Sentiment Analysis

Authors

  • Indah Yessi Kairupan Department of Informatics Engineering, Universitas Katolik De La Salle Manado, Indonesia
  • Apriandy Angdresey Department of Informatics Engineering, Universitas Katolik De La Salle Manado, Indonesia
  • Hamdani Arif Department of Informatics Engineering, Politeknik Negeri Batam, Indonesia
  • Kenshin Geraldy Emor Department of Informatics Engineering, Universitas Katolik De La Salle Manado, Indonesia

DOI:

https://doi.org/10.12695/ajtm.2023.16.3.5

Keywords:

Twitter, Classification, Sentiment Analysis, XGBoost, Republic of Indonesia The Ministry of Health

Abstract

Since 2020, when the coronavirus epidemic was at its peak, the Indonesian Ministry of Health's social media accounts have been constantly followed by a big number of individuals. The Indonesian Ministry of Health account is a fantastic resource for social media users, particularly Twitter users. The Republic of Indonesia's Ministry of Health's Twitter account publishes a wide range of content at random. As a result, it is usually difficult for Twitter users to determine the type of information provided by the Ministry of Health of the Republic of Indonesia.  The positive and negative responses of Twitter users to material released by the Indonesian Ministry of Health's Twitter account are frequently noted.  The decision tree algorithm is tree-based, similar to the extreme gradient boosting method (XGBoost). The extreme gradient boosting approach has been successfully implemented with high performance in the classification process. This classification is separated into two primary categories: general and essential information categorization and sentiment analysis, which is classified into three classes: positive, neutral, and negative. Both the classification work and the sentiment analysis produced outstanding accuracy levels. Based on 2243 tweets, an accuracy rate of 89.35% has been achieved for classification, supported by a precision of 88.76% and a recall value of 88.58% when using 80 data training and 20 data testing.  Similarly, the maximum accuracy in sentiment analysis was achieved utilizing the same 80-20 data partitioning, with a 91.22% accuracy rate. Using 304 comments data, accuracy was calculated to be 89.17% and recall was calculated to be 89.06%.  It's worth noting that an 80-20 split for training and testing consistently produced the best results for both the sentiment analysis and classification tasks.

Downloads

Download data is not yet available.

References

Angdresey, A., Kairupan, I. Y., & Emor, K. G. (2022). Classification and Sentiment Analysis on Tweets of the Ministry of Health Republic of Indonesia. 2022 Seventh International Conference on Informatics and Computing (ICIC), 1–6.

Awaludin, A. A. R. (2017). Akreditasi Sekolah sebagai Suatu Upaya Penjaminan Mutu Pendidikan di Indonesia. SAP (Susunan Artikel Pendidikan), 2(1).

Chazal, F., & Michel, B. (2021). An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists. In Frontiers in Artificial Intelligence (Vol. 4).

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-August-2016.

Cheng, L. C., & Tsai, S. L. (2019). Deep learning for automated sentiment analysis of social media. Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2019.

Cherif, I. L., & Kortebi, A. (2019). On using eXtreme Gradient Boosting (XGBoost) Machine Learning algorithm for Home Network Traffic Classification. IFIP Wireless Days, 2019-April.

Damaratih, D. A. (2021). Sentiment Analysis of Online Lecture Opinions on Twitter Social Media Using Naive Bayes Classifier. 2021 International Conference on Computer Science, Information Technology, and Electrical Engineering, ICOMITEE 2021.

Darwis, D., Pratiwi, E. S., & Pasaribu, A. F. O. (2020). Penerapan Algoritma SVM Untuk Analisis Sentimen Pada Data Twitter Komisi Pemberantasan Korupsi Republik Indonesia. Edutic - Scientific Journal of Informatics Education, 7(1).

Desdhanty, V. S., & Rustam, Z. (2021). Liver Cancer Classification Using Random Forest and Extreme Gradient Boosting (XGBoost) with Genetic Algorithm as Feature Selection. 2021 International Conference on Decision Aid Sciences and Application, DASA 2021.

Fatwa, A. (2020). Pemanfaatan Teknologi Pendidikan di Era New Normal. Indonesian Journal of Instructional Technology, 1(2), 20–31.

Giovani, A. P., Ardiansyah, A., Haryanti, T., Kurniawati, L., & Gata, W. (2022). Implementasi Metode Multinomial Naïve Bayes Untuk Sentiment Analysis Terhadap Data Ulasan Produk Colearn Pada Google Play Store. Prosiding Seminar Nasional Mahasiswa Fakultas Teknologi Informasi (SENAFTI), 1(1).

Husada, H. C., & Paramita, A. S. (2021). Analisis Sentimen Pada Maskapai Penerbangan di Platform Twitter Menggunakan Algoritma Support Vector Machine (SVM). Teknika, 10(1).

Ichwanul Muslim Karo Karo. (2020). Implementasi Metode XGBoost dan Feature Importance untuk Klasifikasi pada Kebakaran Hutan dan Lahan. Journal of Software Engineering, Information and Communication Technology, 1(1).

Lavicza, Z., Weinhandl, R., Prodromou, T., Anđić, B., Lieban, D., Hohenwarter, M., Fenyvesi, K., Brownell, C., & Diego-Mantecón, J. M. (2022). Developing and Evaluating Educational Innovations for STEAM Education in Rapidly Changing Digital Technology Environments. Sustainability (Switzerland), 14(12).

Li, G., Zheng, Q. S., Zhang, L., Guo, S. Z., & Niu, L. Y. (2020). Sentiment Infomation based Model for Chinese text Sentiment Analysis. 2020 IEEE 3rd International Conference on Automation, Electronics and Electrical Engineering, AUTEEE 2020.

Luo, S., Zhang, S., & Cong, H. (2021). Research on Consumer Purchasing Prediction Based on XGBoost Algorithm. 2021 IEEE International Conference on Artificial Intelligence and Computer Applications, ICAICA 2021.

Matrutty, J. P., Adrian, A. M., & Angdresey, A. (2023). Sentiment Analysis of Visitor Reviews on Star Hotels in Manado City. Journal of Information Technology and Computer Science, 8(1).

Musa, U., Adebiyi, M. O., Adebiyi, A. A., & Adebiyi, A. A. (2023). Development of a Machine Learning Model For Big Data Analytics. 2023 International Conference on Science, Engineering and Business for Sustainable Development Goals (SEB-SDG), 1, 1–6.

Rathore, A. K., Maurya, D., & Srivastava, A. K. (2021). Do policymakers use social media for policy design? A Twitter Analytics Approach. Australasian Journal of Information Systems, 25.

RI, K. (2021). PMK No 10 Tahun 2021 Tentang Pelaksanaan Vaksinasi dalam Rangka Penanggulangan Pandemi Corona Virus Disease 2019 (COVID-19). Permenkes RI, 2019.

Shim, J. G., Ryu, K. H., Lee, S. H., Cho, E. A., Lee, Y. J., & Ahn, J. H. (2021). Text mining approaches to analyze public sentiment changes regarding covid-19 vaccines on social media in korea. International Journal of Environmental Research and Public Health, 18(12).

Tai-Seale, M., May, N., Sitapati, A., & Longhurst, C. A. (2022). A learning health system approach to COVID-19 exposure notification system rollout. Learning Health Systems, 6(2).

Wardani, S. K., & Ruldeviyani, Y. (2021). Sentiment Analysis of Visitor Reviews on Hotel in West Sumatera. Proceedings - IWBIS 2021: 6th International Workshop on Big Data and Information Security.

Wongkar, M., & Angdresey, A. (2019). Sentiment Analysis Using Naive Bayes Algorithm Of The Data Crawler: Twitter. Proceedings of 2019 4th International Conference on Informatics and Computing, ICIC 2019.

Downloads

Submitted

2023-09-14

Accepted

2024-04-01

Published

2023-12-30

How to Cite

Kairupan, I. Y., Angdresey, A., Arif, H., & Emor, K. G. (2023). An Extreme Gradient Boosting Approach for Classification and Sentiment Analysis. The Asian Journal of Technology Management (AJTM), 16(3), 211–225. https://doi.org/10.12695/ajtm.2023.16.3.5

Issue

Section

Articles