ABSTRACT

The novel coronavirus has been rampaging through the world since the end of 2019. In a short time, the infection reached several countries leading to borders closing, mass quarantines, and strict measures aiming to control the emergent situation. A key element for decision making in a time-changing situation like the present is the collection of quality data. When the infection reached a country, the affected nation started collecting and releasing information depending on their protocols. Up to date, most of the countries are releasing daily files adding information regarding the new and updated cases to their original files. The massive case counting that some countries are facing, along with the prolonged time that the pandemic has been present, has led to multiple problems with the data collection itself and with the data organization, filing, and readiness for the public to use such information. The availability of resources has affected the experience of each country as regards data collection, and the case count makes each country a unique scenario. The present chapter is focused on the Colombian case, where it was found that the selected protocols for data collection, filing, and release lead to constantly growing datasets that require extensive cleansing before the data is used for analysis. The authors explored different methods to fix the released information. By using clustering methods, it was possible to correct the dataset and reduce the manual errors at the health institutions. By using the normalized information, a transition matrix between states inside the health system is introduced. Finally, the authors delved into the importance of standardizing and releasing the data properly.