Introduction

Are we drowning in the massive volume of information coming from surveillance and public health-related research? This may sound counterintuitive because common wisdom in public health assumes that we need more data [1,2,3]. It is also at odds with the idea that data sciences and big data, notably through the developments of digital health, artificial intelligence, personalized medicine, and precision public health, will be the decisive transformative element of public health [3,4,5]. In this essay, we argue, however, that getting more data will not be sufficient to solve our information needs for public health decision-making. As revealed by COVID-19 pandemic surveillance failures, we need better data and well-defined information needs, within a population perspective. Data of dubious quality and reliability waste resources and create new data-genic public health damages [6]. We call, therefore, for a slow data public health to foster evidence-based and high-quality data-driven public health decision-making.

Failure of surveillance systems

The COVID-19 pandemic was a crash test, revealing the state of surveillance systems across the world and how research informs public health decisions [7]. In the early phase of the pandemic, many countries' surveillance and health information systems were not ready for such a threat [7]. Systems to capture timely and high-quality data useful for decisions were missing, and the lack of common standards hampered international coordination and comparisons. With time, the situation improved with the growing availability of multiple types of data and the development of more efficient surveillance systems. Nevertheless, data ingestion workflow remained problematic, and major inconsistencies across countries and settings continued, including, e.g., for death outcomes with intractable between countries differences in definitions, completeness, and over- or under-counting [8, 9].

Furthermore, on top of the pandemic, an epidemic of information, that is, an “infodemic”, took place [10]. Surveillance-related numbers flooded society through multiple media. Probably at an unprecedented scale, numerous researchers started to work on COVID-19 surveillance-related topics. The volume of surveillance and research data became rapidly overwhelming, also due to their massive echo through media and digital social platforms accompanied by often gross distortion from conflicted stakeholders and conspiracy theories. The convergence of high-volume and low-quality data became a major problem for policymakers and the public, with information needs only partly fulfilled.

Confusion between surveillance and research

Behind the failure of surveillance and information systems are three major problems that must be fixed if we want to move toward evidence-based and rigorous data-driven public health decision-making (Table 1). The first problem is the confusion between surveillance and research [1, 11]. Many research findings have been used as information for decision-making. Research provides knowledge and helps bound uncertainties around this knowledge, and that can be useful to guide decisions. However, it is not designed explicitly to inform policy-making and support public health decisions. It is the role of surveillance to help decision-making through the systematic collection, analysis, and interpretation of data, integrated with the timely dissemination of the resulting information to those responsible for preventing and controlling disease [1].

Table 1 Three problems magnified by the pandemic and hampering the application of evidence-based and rigorous data-driven health decision-making

With the pandemic, many researchers and data scientists were for the first time involved in surveillance-related research, counseling, and policymaking, while experiencing intense media exposure. On the one hand, this was useful in some settings, resulting in greater democratization of surveillance (done not only through governmental agencies), supporting diverse analytic approaches, and external testing. On the other hand, much noise—if not misinformation—emerged from this exposure, notably because researchers are not trained for public health counseling and communication activities; they often lack a public health surveillance culture, and they tend to be overconfident about how they understand others (citizen, decision-makers) and how others understand them [12]. Concurrently, many decision-makers, while struggling with their weak health and data literacy, had to deal for the first time with researchers and the convoluted scientific processes of knowledge production. The pandemic made visible this process in real-time and the presence of scientific discourse in the public sphere was stronger than ever [1].

Fooled by big data

The second major problem revealed by the pandemic is that big data do not speak by themselves. Big data refer to the massive amount of data accessible through the digitalization of all aspects of life, including health and healthcare [4]. They are characterized by their variety, volume, and velocity—the 3 Vs—but also often by their poor (or undocumented) quality [1]. To better understand the issue of the poor quality of these data, it is helpful to make a distinction between “designed” and “organic” data [13].

Surveillance activities are traditionally based on what is called “designed” data coming from classical surveillance tools such as surveys or registries and ideally gathered using well-defined epidemiologic methods to capture population-level data. The validity and reliability of these data can be documented, and they can be tailored to address specific public health problems. Conversely, a large share of big data can be characterized as “organic” because they are a byproduct of another activity, e.g., health care provision [13]. While their secondary use makes them potentially informative for surveillance, their validity and reliability are often only partially documented. A growing share of research is also conducted with organic data, opening new avenues, notably in the field of health services research and “precision” public health. Data-driven analyses (such as data mining or applications of artificial intelligence) make these data fit to produce information useful for decision-making. However, some features of these methods (e.g., flexible data analysis, multiplicity of options, and lack of prespecified hypotheses) increase the probability of false findings [14].

Beyond their intrinsic value to capture any information of interest, organic data are typically exposed to strong “selectivity bias”, a term coined to highlight that the source population of these data is difficult to identify and is not stable across time and settings (Table 2) [1]. The population perspective is blurred. Completeness and representativeness cannot be ensured due to the non-probabilistic nature of these data and the selectivity of people from which data are recorded [15]. Because the source population and its sampling circumstances change, these data are constantly evolving, making them problematic for surveillance. Information derived from these data is not easily transportable to a target population (Box 1).

Table 2 Selected practices to improve the quality of research related to surveillance (adapted from Ioannidis 2014 [14]) and their relevance for a slow data public health

Another issue with big data is that their size gives a false sense of security [19]. Indeed, increasing data size shrinks confidence intervals around estimates but solidifies the effect of survey bias. This has been called the “Big Data Paradox”: the more data we have, the more we can be fooled by “precisely inaccurate” estimates [20]. Hence, surveys on Facebook, with about 250′000 responders per week, could estimate first-dose COVID-19 vaccine uptake in US adults with minuscule margins of errors but in excess of 17 percentage points compared to CDC estimates [19].

Infodemic management

The third major problem magnified by the pandemic is the “infodemic”, a term coined to describe the overabundance of information, including misinformation, disseminated on a large scale via multiple (mostly non-scientific) channels [21]. The spreading of data and information is accelerated through direct communication and content production by social media platforms, without the mediation of relevant experts [10]. In addition to the growing sources of data, there is a multiplication of information producers, a phenomenon favored by emerging artificial intelligence tools [6]. Moreover, the echo chamber mechanism reinforces shared narratives and fosters individual polarization [10]. Decision-making under these circumstances becomes challenging, as policymakers and citizens try to navigate the mounting pressure from the infodemic that affects public opinion, perceptions, and expectations. The pandemic has therefore highlighted how critical it is to identify reliable experts and to define what can reasonably be expected from them (Box 2).

Box 1 Data from populations or populations from data?

In response to huge volume of research output and its reverberation through multiple venues, evidence syntheses are needed to offer a balance against the untamed infodemic. Hence, multiple systematic or rapid reviews have been produced to summarize COVID-19-related research, but too many were of low quality [25]. Further, many COVID-19 study findings were the subject of exaggerated information and major excitements followed by severe disappointments, as well as by rapidly alternative extreme and opposite research claims [26]. For example, interventions that eventually were shown to be ineffective or harmful had some early studies claiming extremely promising results and receiving high media attention; they had also high citations in the scientific literature, and a prominent place in reviews and scientific opinion pieces [26]. All that noise blurred evidence-based decision-making.

One consequence of infodemic is that the efficiency of public health information systems, as defined by the ratio of useful information over resources allocated to gather it, diminishes enormously. Low-cost data end up causing huge costs by misguiding both the surveillance and research enterprises as well as decision-making (Box 3).

Box 2 Who are the experts? And where are they?

The waste of resources, debates, and the recycling of wasteful information can slow the identification and implementation of effective evidence-based policies. Further, notably through the spread of misinformation and fake news eroding trust in institutions, communities may eventually reject sound expert advice and evidence-based policymaking, as they become difficult to differentiate from the surrounding waste. New developments in artificial intelligence could accentuate these trends as they can act as multipliers of the infodemic [6].

Toward a slow data public health

In response to these challenges, we propose to move from big data public health to slow data public health [6]. The pandemic has revealed how a massive amount of research and surveillance data is not sufficient to fulfill our information needs. Slow data means that what matters more than data collection and analyses are, first, the careful and purpose-driven identification of specific public health information needs and, second, the efficient and purpose-driven dissemination of this information (Fig. 1). It also highlights the importance of collecting fewer, but higher quality data designed for these purposes.

Fig. 1
figure 1

Within a surveillance framework [1], the process of public health information production goes from the identification of information needs, at a population level, to decision-making. In a slow data public health approach, it is critical to be clear and purposeful regarding the information needs and the dissemination strategy

Identifying public health information needs requires surveillance experts, data scientists, healthcare providers, patient representatives, researchers, policymakers, and citizens to work together. Policymakers and citizens are the legitimate stakeholders to define the needs and which health outcomes should be valued. They also determine which resources are given to surveillance and research activity [6]. Surveillance experts, working in independent and scientifically credible institutions [24], are key players in linking policymakers and health data researchers. Once the needs are identified, these experts can design adequate surveillance systems. Anticipation is also a key feature of slow data public health, and potential information needs must be defined early on, not only during a crisis [29].

A purpose-driven information dissemination strategy is the other central element of a slow data public health. Transforming evidence into useful information for decision-making requires putting communication, dissemination, and implementation sciences at the heart of surveillance [30]. The infodemic has also revealed that the cognitive satisfaction provided by the information, more than its quality, favors its diffusion [31]. People understand people and stories more than data; they need narratives around the data. We have therefore to develop a shared culture of public health surveillance between researchers, data scientists, and decision-makers. In an era of infodemic and misinformation, and because the truth does not defend by itself in a deregulated information and cognitive market [31], maintaining trust in public health expertise and in science is essential in this effort. Scientists and decision-makers must be better trained in surveillance and epidemiology. At a societal level, it will also necessitate improving health data literacy and critical thinking.

To improve the quality of surveillance-related research, we know the measures we should apply (Table 2), and several of these measures are integral to a slow data approach. Efforts need to be coordinated at national and international levels, with standardization of definitions and practices to strengthen the comparability of the data eventually collected across different places and over time. Institutions expert in surveillance and population health sciences should lead initiatives for the integration of surveillance data streams and designing foundational data for surveillance. The impact of poor standardization should be fully acknowledged, and more weight should be given to metrology [18]. A reproducibility culture is also very helpful, especially to prevent the dissemination of exciting claims, rapidly gaining numerous believers, as it happened for many proposed treatments for COVID-19 [26]. The whole effort should also give priority to trust, and uncertainty should be communicated to its full breadth. Overpromising certainty to justify public health measures is likely to backfire. Finally, the mediatization of surveillance and study results can create sensationalism and should be done cautiously.

Box 3 Surveillance bias

To strengthen surveillance outputs, the information production machinery should be conceptualized within the ecosystem of health decision-making [32], in a population perspective. Within a metrology framework [18], fostering primary data providers to improve the structure and semantics of the data they collect is critical to produce meaningful information from them [6]. Further, while data from healthcare providers constitute the basic layer of surveillance, the core surveillance activity should be designed at a population level, using population-based tools with information systems designed to avoid surveillance bias. Finally, slowing information production does not imply poor timeliness, because once we know which information is needed and how to disseminate it, the production process is more efficient.

Conclusion

In the age of infodemic, providing useful information for decision-making requires more than getting more data. Data of dubious quality and reliability waste resources and create data-genic public health damages. A slow data public health means, first, prioritizing the identification of specific information needs and, second, disseminating information in a way that informs decision-making, rather than devoting massive resources to data collection and analysis. It requires better data, ideally population-based, rather than more data, and aims to be timely rather than deceptively fast. Applied by independent institutions with expertise in epidemiology and surveillance methods, it allows a thoughtful and timely response, based on high-quality data fostering trustworthiness.