Slow data public health

Chiolero, Arnaud; Tancredi, Stefano; Ioannidis, John P. A.

doi:10.1007/s10654-023-01049-6

Slow data public health

ESSAY
Open access
Published: 03 October 2023

Volume 38, pages 1219–1225, (2023)
Cite this article

Download PDF

You have full access to this open access article

European Journal of Epidemiology Aims and scope Submit manuscript

Slow data public health

Download PDF

Arnaud Chiolero^1,2,3,
Stefano Tancredi¹ &
John P. A. Ioannidis⁴

3096 Accesses
118 Altmetric
15 Mentions
Explore all metrics

Abstract

Surveillance and research data, despite their massive production, often fail to inform evidence-based and rigorous data-driven health decision-making. In the age of infodemic, as revealed by the COVID-19 pandemic, providing useful information for decision-making requires more than getting more data. Data of dubious quality and reliability waste resources and create data-genic public health damages. We call therefore for a slow data public health, which means focusing, first, on the identification of specific information needs and, second, on the dissemination of information in a way that informs decision-making, rather than devoting massive resources to data collection and analysis. A slow data public health prioritizes better data, ideally population-based, over more data and aims to be timely rather than deceptively fast. Applied by independent institutions with expertise in epidemiology and surveillance methods, it allows a thoughtful and timely public health response, based on high-quality data fostering trustworthiness.

Data Use in Public Health

Conversations and connections: improving real-time health data on behalf of public interest

Article 26 January 2019

Proceedings from the CIHLMU 2022 Symposium: “Availability of and Access to Quality Data in Health”

Article Open access 17 August 2023

Introduction

Are we drowning in the massive volume of information coming from surveillance and public health-related research? This may sound counterintuitive because common wisdom in public health assumes that we need more data [1,2,3]. It is also at odds with the idea that data sciences and big data, notably through the developments of digital health, artificial intelligence, personalized medicine, and precision public health, will be the decisive transformative element of public health [3,4,5]. In this essay, we argue, however, that getting more data will not be sufficient to solve our information needs for public health decision-making. As revealed by COVID-19 pandemic surveillance failures, we need better data and well-defined information needs, within a population perspective. Data of dubious quality and reliability waste resources and create new data-genic public health damages [6]. We call, therefore, for a slow data public health to foster evidence-based and high-quality data-driven public health decision-making.

Failure of surveillance systems

The COVID-19 pandemic was a crash test, revealing the state of surveillance systems across the world and how research informs public health decisions [7]. In the early phase of the pandemic, many countries' surveillance and health information systems were not ready for such a threat [7]. Systems to capture timely and high-quality data useful for decisions were missing, and the lack of common standards hampered international coordination and comparisons. With time, the situation improved with the growing availability of multiple types of data and the development of more efficient surveillance systems. Nevertheless, data ingestion workflow remained problematic, and major inconsistencies across countries and settings continued, including, e.g., for death outcomes with intractable between countries differences in definitions, completeness, and over- or under-counting [8, 9].

Furthermore, on top of the pandemic, an epidemic of information, that is, an “infodemic”, took place [10]. Surveillance-related numbers flooded society through multiple media. Probably at an unprecedented scale, numerous researchers started to work on COVID-19 surveillance-related topics. The volume of surveillance and research data became rapidly overwhelming, also due to their massive echo through media and digital social platforms accompanied by often gross distortion from conflicted stakeholders and conspiracy theories. The convergence of high-volume and low-quality data became a major problem for policymakers and the public, with information needs only partly fulfilled.

Confusion between surveillance and research

Behind the failure of surveillance and information systems are three major problems that must be fixed if we want to move toward evidence-based and rigorous data-driven public health decision-making (Table 1). The first problem is the confusion between surveillance and research [1, 11]. Many research findings have been used as information for decision-making. Research provides knowledge and helps bound uncertainties around this knowledge, and that can be useful to guide decisions. However, it is not designed explicitly to inform policy-making and support public health decisions. It is the role of surveillance to help decision-making through the systematic collection, analysis, and interpretation of data, integrated with the timely dissemination of the resulting information to those responsible for preventing and controlling disease [1].

Table 1 Three problems magnified by the pandemic and hampering the application of evidence-based and rigorous data-driven health decision-making

Full size table

With the pandemic, many researchers and data scientists were for the first time involved in surveillance-related research, counseling, and policymaking, while experiencing intense media exposure. On the one hand, this was useful in some settings, resulting in greater democratization of surveillance (done not only through governmental agencies), supporting diverse analytic approaches, and external testing. On the other hand, much noise—if not misinformation—emerged from this exposure, notably because researchers are not trained for public health counseling and communication activities; they often lack a public health surveillance culture, and they tend to be overconfident about how they understand others (citizen, decision-makers) and how others understand them [12]. Concurrently, many decision-makers, while struggling with their weak health and data literacy, had to deal for the first time with researchers and the convoluted scientific processes of knowledge production. The pandemic made visible this process in real-time and the presence of scientific discourse in the public sphere was stronger than ever [1].

Fooled by big data

The second major problem revealed by the pandemic is that big data do not speak by themselves. Big data refer to the massive amount of data accessible through the digitalization of all aspects of life, including health and healthcare [4]. They are characterized by their variety, volume, and velocity—the 3 Vs—but also often by their poor (or undocumented) quality [1]. To better understand the issue of the poor quality of these data, it is helpful to make a distinction between “designed” and “organic” data [13].

Surveillance activities are traditionally based on what is called “designed” data coming from classical surveillance tools such as surveys or registries and ideally gathered using well-defined epidemiologic methods to capture population-level data. The validity and reliability of these data can be documented, and they can be tailored to address specific public health problems. Conversely, a large share of big data can be characterized as “organic” because they are a byproduct of another activity, e.g., health care provision [13]. While their secondary use makes them potentially informative for surveillance, their validity and reliability are often only partially documented. A growing share of research is also conducted with organic data, opening new avenues, notably in the field of health services research and “precision” public health. Data-driven analyses (such as data mining or applications of artificial intelligence) make these data fit to produce information useful for decision-making. However, some features of these methods (e.g., flexible data analysis, multiplicity of options, and lack of prespecified hypotheses) increase the probability of false findings [14].

Beyond their intrinsic value to capture any information of interest, organic data are typically exposed to strong “selectivity bias”, a term coined to highlight that the source population of these data is difficult to identify and is not stable across time and settings (Table 2) [1]. The population perspective is blurred. Completeness and representativeness cannot be ensured due to the non-probabilistic nature of these data and the selectivity of people from which data are recorded [15]. Because the source population and its sampling circumstances change, these data are constantly evolving, making them problematic for surveillance. Information derived from these data is not easily transportable to a target population (Box 1).

Table 2 Selected practices to improve the quality of research related to surveillance (adapted from Ioannidis 2014 [14]) and their relevance for a slow data public health

Full size table

Another issue with big data is that their size gives a false sense of security [19]. Indeed, increasing data size shrinks confidence intervals around estimates but solidifies the effect of survey bias. This has been called the “Big Data Paradox”: the more data we have, the more we can be fooled by “precisely inaccurate” estimates [20]. Hence, surveys on Facebook, with about 250′000 responders per week, could estimate first-dose COVID-19 vaccine uptake in US adults with minuscule margins of errors but in excess of 17 percentage points compared to CDC estimates [19].

Infodemic management

The third major problem magnified by the pandemic is the “infodemic”, a term coined to describe the overabundance of information, including misinformation, disseminated on a large scale via multiple (mostly non-scientific) channels [21]. The spreading of data and information is accelerated through direct communication and content production by social media platforms, without the mediation of relevant experts [10]. In addition to the growing sources of data, there is a multiplication of information producers, a phenomenon favored by emerging artificial intelligence tools [6]. Moreover, the echo chamber mechanism reinforces shared narratives and fosters individual polarization [10]. Decision-making under these circumstances becomes challenging, as policymakers and citizens try to navigate the mounting pressure from the infodemic that affects public opinion, perceptions, and expectations. The pandemic has therefore highlighted how critical it is to identify reliable experts and to define what can reasonably be expected from them (Box 2).

Box 1 Data from populations or populations from data?

Full size table

In response to huge volume of research output and its reverberation through multiple venues, evidence syntheses are needed to offer a balance against the untamed infodemic. Hence, multiple systematic or rapid reviews have been produced to summarize COVID-19-related research, but too many were of low quality [25]. Further, many COVID-19 study findings were the subject of exaggerated information and major excitements followed by severe disappointments, as well as by rapidly alternative extreme and opposite research claims [26]. For example, interventions that eventually were shown to be ineffective or harmful had some early studies claiming extremely promising results and receiving high media attention; they had also high citations in the scientific literature, and a prominent place in reviews and scientific opinion pieces [26]. All that noise blurred evidence-based decision-making.

One consequence of infodemic is that the efficiency of public health information systems, as defined by the ratio of useful information over resources allocated to gather it, diminishes enormously. Low-cost data end up causing huge costs by misguiding both the surveillance and research enterprises as well as decision-making (Box 3).

Box 2 Who are the experts? And where are they?

Full size table

The waste of resources, debates, and the recycling of wasteful information can slow the identification and implementation of effective evidence-based policies. Further, notably through the spread of misinformation and fake news eroding trust in institutions, communities may eventually reject sound expert advice and evidence-based policymaking, as they become difficult to differentiate from the surrounding waste. New developments in artificial intelligence could accentuate these trends as they can act as multipliers of the infodemic [6].

Toward a slow data public health

In response to these challenges, we propose to move from big data public health to slow data public health [6]. The pandemic has revealed how a massive amount of research and surveillance data is not sufficient to fulfill our information needs. Slow data means that what matters more than data collection and analyses are, first, the careful and purpose-driven identification of specific public health information needs and, second, the efficient and purpose-driven dissemination of this information (Fig. 1). It also highlights the importance of collecting fewer, but higher quality data designed for these purposes.

Identifying public health information needs requires surveillance experts, data scientists, healthcare providers, patient representatives, researchers, policymakers, and citizens to work together. Policymakers and citizens are the legitimate stakeholders to define the needs and which health outcomes should be valued. They also determine which resources are given to surveillance and research activity [6]. Surveillance experts, working in independent and scientifically credible institutions [24], are key players in linking policymakers and health data researchers. Once the needs are identified, these experts can design adequate surveillance systems. Anticipation is also a key feature of slow data public health, and potential information needs must be defined early on, not only during a crisis [29].

A purpose-driven information dissemination strategy is the other central element of a slow data public health. Transforming evidence into useful information for decision-making requires putting communication, dissemination, and implementation sciences at the heart of surveillance [30]. The infodemic has also revealed that the cognitive satisfaction provided by the information, more than its quality, favors its diffusion [31]. People understand people and stories more than data; they need narratives around the data. We have therefore to develop a shared culture of public health surveillance between researchers, data scientists, and decision-makers. In an era of infodemic and misinformation, and because the truth does not defend by itself in a deregulated information and cognitive market [31], maintaining trust in public health expertise and in science is essential in this effort. Scientists and decision-makers must be better trained in surveillance and epidemiology. At a societal level, it will also necessitate improving health data literacy and critical thinking.

To improve the quality of surveillance-related research, we know the measures we should apply (Table 2), and several of these measures are integral to a slow data approach. Efforts need to be coordinated at national and international levels, with standardization of definitions and practices to strengthen the comparability of the data eventually collected across different places and over time. Institutions expert in surveillance and population health sciences should lead initiatives for the integration of surveillance data streams and designing foundational data for surveillance. The impact of poor standardization should be fully acknowledged, and more weight should be given to metrology [18]. A reproducibility culture is also very helpful, especially to prevent the dissemination of exciting claims, rapidly gaining numerous believers, as it happened for many proposed treatments for COVID-19 [26]. The whole effort should also give priority to trust, and uncertainty should be communicated to its full breadth. Overpromising certainty to justify public health measures is likely to backfire. Finally, the mediatization of surveillance and study results can create sensationalism and should be done cautiously.

Box 3 Surveillance bias

Full size table

To strengthen surveillance outputs, the information production machinery should be conceptualized within the ecosystem of health decision-making [32], in a population perspective. Within a metrology framework [18], fostering primary data providers to improve the structure and semantics of the data they collect is critical to produce meaningful information from them [6]. Further, while data from healthcare providers constitute the basic layer of surveillance, the core surveillance activity should be designed at a population level, using population-based tools with information systems designed to avoid surveillance bias. Finally, slowing information production does not imply poor timeliness, because once we know which information is needed and how to disseminate it, the production process is more efficient.

Conclusion

In the age of infodemic, providing useful information for decision-making requires more than getting more data. Data of dubious quality and reliability waste resources and create data-genic public health damages. A slow data public health means, first, prioritizing the identification of specific information needs and, second, disseminating information in a way that informs decision-making, rather than devoting massive resources to data collection and analysis. It requires better data, ideally population-based, rather than more data, and aims to be timely rather than deceptively fast. Applied by independent institutions with expertise in epidemiology and surveillance methods, it allows a thoughtful and timely response, based on high-quality data fostering trustworthiness.

References

Chiolero A, Buckeridge D. Glossary for public health surveillance in the age of data science. J Epidemiol Community Health. 2020;74:612–6.
PubMed Google Scholar
Desmond-Hellmann S. Progress lies in precision. Science. 2016;353:731.
Article PubMed CAS Google Scholar
Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28:31–8.
Article PubMed CAS Google Scholar
Mooney SJ, Pejaver V. Big data in public health: terminology, machine learning, and privacy. Annu Rev Public Health. 2018;39:95–112.
Article PubMed Google Scholar
Chowkwanyun M, Bayer R, Galea S. “Precision” Public health - between novelty and hype. N Engl J Med. 2018;379:1398–400.
Article PubMed Google Scholar
Chiolero A. How infodemic intoxicates public health. Surveillance: from a big to a slow data culture. J Epidemiol Community Health. 2022;76:623–5.
Article PubMed Google Scholar
Brownson RC, Burke TA, Colditz GA, Samet JM. Reimagining public health in the aftermath of a pandemic. Am J Public Health. 2020;110:1605–10.
Article PubMed PubMed Central Google Scholar
Ioannidis JPA, Zonta F, Levitt M. Flaws and uncertainties in pandemic global excess death calculations. Eur J Clin Invest 2023:e14008.
Msemburi W, Karlinsky A, Knutson V, Aleshin-Guendel S, Chatterji S, Wakefield J. The WHO estimates of excess mortality associated with the COVID-19 pandemic. Nature. 2023;613(7942):130–7.
Article PubMed CAS Google Scholar
Briand SC, Cinelli M, Nguyen T, Lewis R, Prybylski D, Valensise CM, et al. Infodemics: a new challenge for public health. Cell. 2021;184:6010–4.
Article PubMed PubMed Central CAS Google Scholar
Hernán MA, Hsu J, Healy B. A second chance to get causal inference right: a classification of data science tasks. Chance. 2019;32:42–9.
Article Google Scholar
Antiochou K, Psillos S. How to handle reasonable scientific disagreement: The case of COVID-19. In: The pandemic of argumentation 2022 (pp. 65–83). Springer, Cham.
Ann Keller S, Koonin SE, Shipp SJS. Big data and city living: what can it do for us? Significance. 2012;9:4–7.
Article Google Scholar
Ioannidis JP. How to make more published research true. PLoS Med. 2014;11(10): e1001747.
Article PubMed PubMed Central Google Scholar
Beręsewicz M, Lehtonen R, Reis F, Di Consiglio L, Karlberg M. An overview of methods for treating selectivity in big data sources. Eurostat: Publications Office of the European Union; 2018.
Google Scholar
Westreich D, Edwards JK, Lesko CR, Cole SR, Stuart EA. Target validity and the hierarchy of study designs. Am J Epidemiol. 2019;188(2):438–43.
Article PubMed Google Scholar
Alexander N. What’s more general than a whole population? Emerg Themes Epidemiol. 2015;12:11.
Article PubMed PubMed Central Google Scholar
Petri D. Big data, dataism and measurement. IEEE Instrum Meas Mag. 2020;23(3):32–4.
Article Google Scholar
Bradley VC, Kuriwaki S, Isakov M, Sejdinovic D, Meng XL, Flaxman S. Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature. 2021;600:695–700.
Article PubMed PubMed Central CAS Google Scholar
McFarland DA, McFarland HR. Big Data and the danger of being precisely inaccurate. Big Data & Society. 2015;2(2).
Gabarron E, Oyeyemi SO, Wynn R. COVID-19-related misinformation on social media: a systematic review. Bull World Health Organ. 2021;99:455–63.
Article PubMed PubMed Central Google Scholar
Ioannidis JPA, Bendavid E, Salholz-Hillel M, Boyack KW, Baas J. Massive covidization of research citations and the citation elite. Proc Natl Acad Sci. 2022;119(28):2204074119.
Article Google Scholar
Ioannidis JP, Tezel A, Jagsi R. Overall and COVID-19-specific citation impact of highly visible COVID-19 media experts: bibliometric analysis. BMJ Open. 2021;11(10):e052856.
Article PubMed Google Scholar
Schiff E, Mallinson DJ. Trumping the centers for disease control: a case comparison of the CDC’s response to COVID-19, H1N1, and Ebola. Adm Soc. 2023;55(1):158–83.
Article PubMed PubMed Central Google Scholar
Abbott R, Bethel A, Rogers M, et al. Characteristics, quality and volume of the first 5 months of the COVID-19 evidence synthesis infodemic: a meta-research study. BMJ Evid Based Med. 2022;27:169–77.
Article PubMed Google Scholar
Ioannidis JPA. High-cited favorable studies for COVID-19 treatments ineffective in large trials. J Clin Epidemiol. 2022;148:1–9.
Article PubMed PubMed Central Google Scholar
Chiolero A. When diagnoses overshadow diseases. Epidemiology. 2023;34(2): e6.
Article PubMed Google Scholar
Lawlor DA, Tilling K, Davey SG. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016;45(6):1866–76.
PubMed Google Scholar
Hassan I, Fernandes G, Mukaigawara M, Sridhar D. Lessons from COVID-19 must be learned before the next outbreak. Nat Med. 2023. https://doi.org/10.1038/s41591-023-02377-6.
Article PubMed PubMed Central Google Scholar
Glasgow RE, Vinson C, Chambers D, Khoury MJ, Kaplan RM, Hunter C. National Institutes of Health approaches to dissemination and implementation science: current and future directions. Am J Public Health. 2012;102:1274–81.
Article PubMed PubMed Central Google Scholar
Bronner G. Apocalypse cognitive. Presses universitaires de France; 2021.
Schünemann HJ, Reinap M, Piggott T, Laidmäe E, Köhler K, Pōld M, Ens B, Irs A, Akl EA, Cuello CA, Falavigna M. The ecosystem of health decision making: from fragmentation to synergy. Lancet Public Health. 2022;7:e378–90.
Article PubMed Google Scholar

Download references

Funding

Open access funding provided by University of Fribourg. None.

Author information

Authors and Affiliations

Population Health Laboratory (#PopHealthLab), University of Fribourg, Route Des Arsenaux 41, 1700, Fribourg, Switzerland
Arnaud Chiolero & Stefano Tancredi
School of Population and Global Health, McGill University, Montreal, Canada
Arnaud Chiolero
Institute of Primary Health Care (BIHAM), University of Bern, Bern, Switzerland
Arnaud Chiolero
Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, USA
John P. A. Ioannidis

Authors

Arnaud Chiolero
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Tancredi
View author publications
You can also search for this author in PubMed Google Scholar
John P. A. Ioannidis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AC conceived and designed the paper. AC, ST, and JPI drafted the manuscript. All authors made substantial contributions to the article. AC is the guarantor.

Corresponding author

Correspondence to Arnaud Chiolero.

Ethics declarations

Conflict of interest

None declared.

Ethics approval

No ethical approval was needed for this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chiolero, A., Tancredi, S. & Ioannidis, J.P.A. Slow data public health. Eur J Epidemiol 38, 1219–1225 (2023). https://doi.org/10.1007/s10654-023-01049-6

Download citation

Received: 05 July 2023
Accepted: 04 September 2023
Published: 03 October 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10654-023-01049-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Slow data public health

Abstract

Similar content being viewed by others

Data Use in Public Health

Conversations and connections: improving real-time health data on behalf of public interest

Proceedings from the CIHLMU 2022 Symposium: “Availability of and Access to Quality Data in Health”

Introduction

Failure of surveillance systems

Confusion between surveillance and research

Fooled by big data

Infodemic management

Toward a slow data public health

Conclusion

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Slow data public health

Abstract

Similar content being viewed by others

Data Use in Public Health

Conversations and connections: improving real-time health data on behalf of public interest

Proceedings from the CIHLMU 2022 Symposium: “Availability of and Access to Quality Data in Health”

Introduction

Failure of surveillance systems

Confusion between surveillance and research

Fooled by big data

Infodemic management

Toward a slow data public health

Conclusion

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation