Open Access Open Access  Restricted Access Subscription or Fee Access

A novel algorithm of data mining to predict future scenarios of COVID-19 pandemic

Muhammad Shaheen, PhD


COVID-19, a novel coronavirus, is an ongoing global pandemic that has outbroken recently and spread to almost every part of the world. Several factors of this pandemic are still unknown to the world, which causes uncertainty to prepare a strategic plan to cope with this disease effectively and securing the future. A large number of research is in progress or expected to start shortly on the basis of the publicly available datasets of this deadly pandemic. The data are available in multiple formats that include geospatial data, medical data, demographic data, and time-series data. In this study, we propose a data mining method to classify and forecast the time-series pandemic data in an attempt to predict the expected end of this pandemic in a particular region. Based on the COVID-19 data obtained from several countries around the world, a naïve Bayes classifier is built, which may classify the affected countries into one of the following four categories: critical, unsustainable, sustainable, and closed. The pandemic data collected from online sources are preprocessed, labeled, and classified by using different data mining techniques. A new clustering technique is also proposed to predict the expected end of the pandemic in different countries. A method to preprocess the data before applying the clustering technique is also proposed. The results of naïve Bayes classification and clustering techniques are validated based on accuracy, execution time, and other statistical measures.



COVID-19, nCoV, classification, naïve Bayes, clustering, data mining, preprocessing, prediction, end-date

Full Text:



World Health Organization (WHO): 2020. Naming the coronavirus disease (COVID-19) and the virus that causes it. Available at Accessed August 2020.

Li J, Xu Q, Cuomo R, et al.: Data mining and content analysis of the Chinese social media platform Weibo during the early COVID-19 outbreak: Retrospective observational infoveillance study. JMIR Public Health Surveill. 2020; 6(2): e18700-10.

Centers for Disease Control and Prevention (CDC): 2020. How COVID-19 spreads. Available at Accessed August 2020.

World Health Organization (WHO): Q&A on coronaviruses (COVID-19). 2020. Available at Accessed August 2020.

ENT UK Royal College of Surgeons (ENT-UK): Loss of sense of smell as marker of COVID-19 infection. 2020. Available at percent20ofpercent20smellpercent20aspercent20marker percent20of percent20COVID.pdf. Accessed August 2020.

Centers for Disease Control and Prevention (CDC): Symptoms of coronavirus. 2020. Available at Accessed August 2020.

Worldometer: 2020. Available at Accessed August 2020.

World Health Organization (WHO): 2020. A Coordinated Global Research Roadmap: 2019 Novel Coronavirus. Geneva, USA: WHO.

Editorial: Stop the Wuhan coronavirus. Nature. 2020; 577(450). DOI: 10.1038/d41586-020-00153-x.

Rubin GJ, Wessely S: The psychological effects of quarantining a city. BMJ. 2020; 368. M313. DOI: 10.1136/bmj.m313.

Han J, Kamber M, Pei J: Data Mining Concepts and Techniques. 3rd ed. Amsterdam: Elsevier, 2012.

Shaheen M, Zafar T, Khan SA: Decision tree classification: Ranking journals using IGIDI. J Inform Sci. 2020; 46: 325-339. DOI: 10.1177/0165551519837176.

Kim E-G, Chun S-H: Analyzing online car reviews using text mining. Sustainability. 2019; 11: 1611.

Patel N, Sethi I: Multimedia data mining: An overview. In Petrushin VA, Khan L (eds.): Multimedia Data Mining and Knowledge Discovery. London: Springer, 2007.

Shaheen M, Shahbaz M: An algorithm of association rule mining for microbial energy prospection. Sci Rep. 2017; 7. Article: 46108.

Shaheen M, Shahbaz M, Guergachi A: Context based positive and negative spatio-temporal association rule mining. Knowl Based Syst. 2013; 37: 261-273.

Fu T-C: A review on time series data mining. Eng Appl Artif Intell. 2011; 24(1): 164-181.

Shaheen M, Shahbaz M, Rehman Z, et al.: Data mining applications in hydrocarbon exploration. Artif Intell Rev. 2011; 35: 1-18.

Shaheen M, Khan MZ: A method of data mining for selection of site for wind turbines. Renew Sustain Energy Rev. 2016; 55: 1225-1233.

Jothi N, Rashid NA, Husain W: Data mining in healthcare—A review. Procedia Comput Sci. 2015; 72: 306-313.

Shaheen M, Shahbaz M, Guergachi A, et al.: Mining sustainability indicators to classify hydrocarbon development. Knowl Based Syst. 2011; 24(8): 1159-1168.

Medium: How to fight COVID-19 with machine learning. 2020. Available at Accessed August 2020.

Eysenbach G: Infodemiology and infoveillance: Framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the internet. J Med Internet Res. 2009; 11(1): e11.

Gianfredi V, Bragazzi N, Mahamid M, et al.: Monitoring public interest toward pertussis outbreaks: An extensive google trends-based analysis. Public Health. 2018; 165: 9-15.

Mamidi R, Miller M, Banerjee T, et al.: Identifying key topics bearing negative sentiment on twitter: Insights concerning the 2015-2016 zika epidemic. JMIR Public Health Surveill. 2019; 5(2): e11036.

Lewis P, Conn D, Pegg D: UK government using confidential patient data in coronavirus response. The Guardian. 2020.

Editorial: Emerging understandings of 2019-nCoV. The Lancet. 2020; 395(10221): 311.

Benvenuto D, Giovanetti M, Ciccozzi A, et al.: The 2019—New coronavirus epidemic: Evidence for virus evolution. J Med Virol. 2020; 92: 455-459.

Al-Turaiki I, Alshahrani M, Almutairi T: Building predictive models for MERS-CoV infections using data mining techniques. J Infect Public Health. 2016; 9: 744-748.

Xu D, Tian Y: A comprehensive survey of clustering algorithms. Ann Data Sci. 2015; 2: 165-193.

Holmes DE, Tweedale J, Jain LC: Data mining techniques in clustering, association and classification. In Data Mining: Foundations and Intelligent Paradigms. Berlin: Springer, 2012: 1-6.

Bishop C: Pattern Recognition and Machine Learning. New York: Springer, 2006. ISBN: 978-0-387-31073-2.

Han J, Kamber M, Pei J: Data Mining Concepts and Techniques. 3rd ed. Amsterdam: Elsevier, 2012: 383-450.

Jin X, Han J: K-means clustering. In Sammut C, Webb GI (eds.): Encyclopedia of Machine Learning. Boston, MA: Springer, 2011.

OCHA HDX: Novel coronavirus (COVID-19) cases data. John Hopkins University Center for System Science & Engineering. 2020. Available at Accessed August 2020.



  • There are currently no refbacks.

Copyright (c) 2023 Journal of Emergency Management