Introduction

Depression accounts for a large share of the global disease burden, with approximately 264 million people globally estimated to suffer from the condition [1]. It is broadly defined as a mental health (MH) disorder that is associated with “depressed mood, loss of interest or pleasure, decreased energy, feelings of guilt or low self-worth, disturbed sleep or appetite, and poor concentration.” [2] Depression is a significant public health issue because of (a) the number of people that are affected (i.e., the disease burden), (b) the social and economic costs associated with the disease burden (i.e., the cost burden), and (c) the nature of the condition in that it is relatively “easy to detect, highly treatable, and a candidate for prevention efforts.” [1, 3, 4]

Despite being one of the most common kinds of MH disorders, much about depression remains unknown, and it attracts less attention than it should within the public health community. National budgets for mental health, on an average, account for less than one percent of the total budget allocated to the health sector in low- and middle-income countries (LMICs) [5]. Mental health does not receive as much priority in higher income countries in the Americas either [6]. Smaller budgets translate into limited workforce and other resources (such as diagnostic tools and medicines) that are essential in successfully diagnosing, monitoring, and treating MH disorders including depression.

With the onset of the COVID-19 pandemic, depression has become particularly salient, bringing awareness to the issue that was largely unprecedented. The COVID pandemic has not only laid bare the weaknesses of health systems, including their ability to address depression, but also has identified the pervasive impact of depression. Social distancing policies aimed at controlling the pandemic are affecting not only those who were already suffering from depression but also those who were otherwise healthy [7, 8]. The pandemic has both overburdened overall health services as well as those that are specifically related to mental health including depression. WHO’s recent survey of 105 countries, for example, found that the treatment for MH disorders was disrupted in 61% of the countries during the pandemic [9]. The pandemic has limited the provision of necessary services for both previously diagnosed and undiagnosed people. More people have been pushed into poverty, placing a strain on country social protection systems on the one hand and making individuals vulnerable to MH problems such as depression on the other [10, 11].

One of the central challenges in addressing the problems occasioned by depression is that the etiopathogenesis of depression is complex and that no single approach is likely to be sufficient to mitigate the burden of depression. Depression is affected by a number of social, economic, and other risk factors—so called “determinants of heath.” [2] These include education, housing, and income. The effects of these determinants on depression may manifest in a number of ways. First, these determinants may directly affect depression. Second, the determinants may influence each other to exacerbate exclusion and forms of oppression, which might in turn, affect depression [12]. Third, these determinants also may interplay with other factors including racism, gender inequities, and pre-existing conditions such as chronic illnesses and family history of genetic disorders [13].

Compounding the problem, there are limited data about the causes and consequences of depression. This arises, in part, from insufficient resource allocation for MH disorders. When MH disorders do not receive funding commensurate with the disease burden that they collectively constitute, only limited infrastructure and human resources are dedicated towards addressing those disorders. This directly becomes a data issue. It is then impossible to collect accurate and comprehensive data on the prevalence and impact of depression because of inadequate infrastructure and human resources. Therefore, relatively limited data on depression, including its wider social determinants, has contributed to a shortage of evidence (in terms of both the number and quality of analyses) that can adequately inform decision making to mitigate depression.

Globally, the most widely available country-level estimates of depression, anxiety, and other MH disorders are available through Global Burden of Disease (GBD) estimates. These data have been the basis for global and regional comparisons, and, for many countries, a source for setting national MH priorities. In addition to these, countries use other tools and resources that are localized and better suited to their contexts. Some countries in Europe, for example, use the National Health and Wellness Survey (NHWS) and European Health Interview Survey, among others. In the United States, the relevant resources on mental health include Behavioral Risk Factor Surveillance System (BRFSS), National Health and Nutrition Examination Survey (NHANES), and National Health Interview Survey (NHIS) [14]. In many countries in Asia and Sub-Saharan Africa, Demographic and Health Surveys (DHS) and Health and Demographic Surveillance System (HDSS) coupled with other regional-, national-, and subnational-level surveys have been the sources for data on depression and other MH disorders.

Big data and the data revolution present new opportunities for informing our understanding and management of depression. Big data has opened possibilities for the use of machine learning (ML) and artificial intelligence (AI) in the diagnosis and treatment of depression. Big data-enabled tools and techniques have allowed clinicians to treat depression through virtual counseling and enhanced their patient monitoring and precision therapy capabilities [15]. Social media data, in particular, have received a lot of attention in terms of how it is used to study or predict depression [16, 17].

This paper highlights data sources that are currently being used in the study of depression and its wider social determinants in India and Brazil. Focusing mainly on the new data sources, the study aims to illuminate both the possibilities and the limitations of using data to address depression. We chose India and Brazil because both the countries have demonstrated evidence of recent developments in data collection using new sources (including government programs and legislations) and are comparable by some broad standards (e.g., World Bank country income classification, type of government, and population size). The two countries also have key differences including geographic location and sociocultural contexts. In focusing on data sources for the two countries, we also briefly examined the broader policy background on data and depression.

Methods

We identified data sources using a combination of targeted PubMed search, Google search, snowball sampling of the additional relevant literature, and expert consultations. Our search focused on data sources for a subset of social determinants for health in the context of India and Brazil. Our methods are broadly similar to the one followed by Stephenson and Stengel [18]. See table A3 of the supplemental file on the details of the search strategy.

We chose a subset of determinants based on the extent to which they are (1) generally accepted to directly impact health (to ensure data are available across countries), (2) commonly found across countries (to ensure saturation of findings), and (3) reasonably well-documented (to ensure that ample literature examining these determinants using various data sources exist). Building on the existing literature, we focus on the following seven determinants of health: healthcare (H), education (E), access to healthy choices (A), labor/employment (L), transportation (T), housing (H), and income (Y) [19,20,21]. Access to healthy choices can potentially include a range of things (including environment and climate change) but for the purposes of this paper, we limit the definition to include food security and physical activity. These seven determinants are collectively termed as “HEALTHY” determinants.

We examined the extent to which both traditional and new sources of data covered the seven HEALTHY determinants. While we focused our search on the new sources of data, we also included traditional sources of data. The new sources of data included (1) electronic medical records (EMRs) and electronic health records (EHRs), (2) social network data, (3) mobile phone data (call data records), (4) GIS data, (5) satellite imagery, (6) economic/market/commerce/consumer data (e.g., retail scanner data, consumer purchase data, and patient payments), (7) remote tracking and sensing (e.g., sensor data from digital devices and wearable technology), (8) internet/media content (e.g., web scraping and text mining), and (9) crowd-sourced and citizen-generated data. The traditional data sources included (i) survey data (e.g., household and facility), (ii) census data, (iii) administrative data (e.g., claims files), (iv) medical records, and (v) vital records (including birth records). Our search focused on publications that used data sources in the context of depression as well as one or more of the HEALTHY determinants. Studies that examined only depression prevalence (for instance, epidemiological studies on depression) without discussing any of the HEALTHY determinants were not included.

We limited our search to publications between October 2010 and September 2020 and data sources in either English (India and Brazil) or Portuguese (Brazil). For our initial search, we examined the following search terms: Data Collection/ Data sources, Depression, Social Determinants of Health, Healthcare, Education, Access to Healthy Choices, Labor/Employment, Transportation, Housing, and Income. We created and implemented a search strategy with PubMed MeSH term(s) that included these concepts and synonymous key words/phrases, including colloquialisms and alternative spellings, for these concepts (see supplemental file). We widened our search using the Boston University Library, Google Scholar, and Google to include peer-reviewed literature as well as gray literature. Portuguese keywords were used for searches in specific journals, Scielo, Google Scholar, and Google. We subsequently identified more country-specific data sources and data environment.

We identified additional literature and subject specialists through snowball sampling based on publication authorship and publication references. We contacted subject matter experts at the WHO, Institutes of Public Health, IBM, and experts from India and Brazil, who provided with additional publications in English or Portuguese. The gray literature included publications from United Nations agencies, the World Bank, the Asian Development Bank, WHO, national statistical agencies and ministries of health, non-governmental organizations, and news reports from leading national dailies of India and Brazil. In order to be intentional about searching for new sources of data, we spent relatively higher amount of effort looking for new sources of data on depression.

Policy Background

Brazil

Brazil is home to nearly 10% of the global population with depression [22] and ranks fifth in depression rates [23]. In 2017, the prevalence of depressive disorders in Brazil was 3.3% [24]. Despite the widespread prevalence, treatment for depression is inadequate and is characterized by inequities [25]. Part of the reason for the treatment gap has to do with limited resources dedicated to depression and other mental health disorders at the local level [26]. Improving data collection and analysis at the local level could help address this. For instance, active surveillance on depression using health care provision records is implemented in only 418 of the 1, 643 municipalities [27].

Home to one of the highest social media users in the world [28], Brazil has been making considerable progress in its data technology and related policies. Government is the fifth biggest investor in big data and the country has started using big data analytics [29], including in the evaluation of primary education [30]. Using European Union standards and based on the General Data Protection Regulation 2 (GDPR), the country approved the General Personal Data Protection Act 1 (LGPD) in 2018 [31]. The LGPD principles speak to a number of key issues related to data including purpose, adequacy, need, free access, data quality, transparency, security, prevention, non-discrimination, and accountability.

India

Depression, along with anxiety, is one of the key disorders responsible for the massive increase in the prevalence of mental health disorders that India has witnessed in the recent times. Close to 200 million Indians are now suffering from a mental health disorder, and of these, about 50 million suffer from depression [32]. Despite the enormity, mental health as a health problem continues to receive low priority in India. The country allocates only about 0.05 percent of its annual health budget on mental health. This is much smaller than the share of other low-income countries that, on an average, spend nearly 0.5 percent of their health budgets on mental health [33].

In terms of the policy landscape on data, the last ten years in India has been characterized not only by initiatives aimed at making data transparent and accessible but also by those that are focused on the need to harness the power of big data. In 2012, India adopted the National Data Sharing and Accessibility Policy and launched “data.gov.in” the same year as part of its open data initiative [34]. Similarly, in 2016, the ministry of science and technology introduced the “Big Data initiatives” program [35] and in 2019, the department of biotechnology introduced Affordable and Accessible Healthcare–Big Data and Genomics initiative [36]. The introduction and launching of these policies coupled with India’s technological advantage has translated into increases in data production. As many as 400,000 health-related mobile appications track data on key markers such as blood pressure, calories intake, physical activity, and blood glucose [37]. Ayushman Bharat Yojna, a government-funded cash-less health insurance scheme with a target to provide insurance to more than 100 million Indians, has a digital platform with a large volume of information that has the potential to allow for meaningful big data analytics [38].

Results

Data Sources

Brazil

Figure 1 below summarizes our findings on the use of traditional and new sources of data in relation to the HEALTHY determinants in Brazil; see supplemental file table A1 for additional details.

Fig. 1
figure 1

Traditional and new sources of data on depression and its HEALTHY determinants for Brazil. This figure summarizes the use of new and traditional data sources related to depression. Only the data sources related to HEALTHY determinants have been considered. A shaded circle indicates the availability of a particular data source type for a given determinant

India

Figure 2 below summarizes our findings on the use of traditional and new sources of data in relation to the HEALTHY determinants in India; see supplemental file table A2 for additional details.

Fig. 2
figure 2

Traditional and new sources of data on depression and its HEALTHY determinants for India. This figure summarizes the use of new and traditional data sources related to depression. Only the data sources related to HEALTHY determinants have been considered. A shaded circle indicates the availability of a particular data source type for a given determinant

Cross-country analysis

As illustrated in Figs. 1 and 2, and summarized in Fig. 3, the following new sources have been used to study depression in the context of the HEALTHY determinants: social media, GIS/GPS, mobile phone, and satellite imagery. Among the new sources of data, publicly available Twitter data is the one used the most to study depression. Twitter data has been used for predictive analytics and sentiment analysis. In both India and Brazil, new sources of data are being used to study depression and its wider determinants in the slum populations. Given that slum dwellers are less likely to access healthcare and be captured in government’s data systems, new data sources hold great promise in studying not just depression but health in general as well as the social determinants of health [39].

Fig. 3
figure 3

Data sources on depression and its HEALTHY determinants in Brazil and India. This figure summarizes data sources for HEALTHY determinants for Brazil and India. The y axis indicates the total number of HEALTHY determinants (for new and traditional data sources combined) for each country. The x axis indicates the type of journals. The “other” category of journals includes interdisciplinary journals

Discussion

Our study examined two countries to explore data sources on social determinants associated with depression and to document traditional and new sources of data. Our key finding is that, overall, there is scant use of new sources of data to study depression and its HEALTHY determinants. This is true for both India and Brazil.

Most of the existing literature used traditional data, with surveys being the leading source. These surveys usually take the form of household surveys or facility-based surveys. In the case of India, household surveys that capture/track depression data include National Family Health Survey (NFHS) and National Mental Health Survey (NMHS). These surveys not only capture information on depression but also on the HEALTHY determinants. In the case of Brazil, surveys include National Household Sampling Survey, National Survey on Health, and the São Paulo Megacity Mental Health Survey. The survey-based data look at the prevalence of depression as well as its relationship with the HEALTHY determinants. Depression-specific studies have looked at the role of gender, poverty, nutritional status, geographic location, family size, marital status, educational background, age, employment, and social status [40].

Most studies were published in public health journals followed by mental health, psychology, and psychiatry journals. Additional journals were interdisciplinary, including those related to mental health and/or technology. Most studies combined two or more of the HEALTHY determinants, with two being the most common number of determinants. When studies used new data sources, they typically complimented them with traditional sources of data. In Brazil, for example, the limited publications using new data sources used data on health surveys together with social media and mobile phone data, and data on survey-based housing data with GIS/GPS. India has a greater variety of data sources and, therefore, higher number of publications using new data sources.

Our findings on India resonate with findings from other studies conducted in the recent years. For example, in a study focused on mental health (including depression, psychosis, bipolar disorder, epilepsy, dementia, and alcohol and drug use disorders), Patel and colleagues [41] note the overall lack of data on mental health disorders in India [41]. They further note the absence of any systematic efforts to incorporate mental health indicators in the health information system. Similarly, Global Burden of Disease (GBD) collaborators in their analysis of state-level burden of mental disorders in India used population-based surveys including the World Health Survey 2003 for India and the National Mental Health Survey 2015–16 as key sources of information [32].

Part of the reason for the relative scarcity of evidence on the use of new data sources could be attributed to the nature of the public–private makeup of the healthcare industry in India. The private sector in India accounts for a significant share of general outpatient and general inpatient care—approximately 70% and 60%, respectively [41]. Even though the private sector is at the core of the big data movement in India (and arguably in many parts of the world), it may not have incentives beyond what is strictly required by the law to share data that it generates. Moreover, data privacy and confidentiality issues prevent the private sector from being forthcoming about data sharing. The overall increase in data production in India masks weaknesses around data collection, storage, and use. For example, only 400 out of 62,000 hospitals collect useful patient data, and they do so only because they are mandated by the National Accreditation Board for Hospitals (NABH) [42]. A significant amount of data that is produced, such as health insurance claims data, remains largely inaccessible [43]. The open data initiative, despite its good intentions, has not been as helpful in the sense of evidence-based decision making because of the absence of district-wise disaggregated data [44].

In Brazil, one of the reasons for the limited use of new data sources to study depression and its HEALTHY determinants could be the robustness of the Brazilian public health system. Because the public health system provides health care to 190 million people (or nearly 80% of the total population) and, therefore, has in place the necessary data systems to do so, there is much less pressing need for the country to use new data sources. The limited use of the new data sources does not necessarily tantamount to lack of data sources altogether. In fact, our findings suggest that cohort-based or survey-based studies of depression are quite common in Brazil. Another reason for the lack of new data sources on depression and its HEALTHY determinants could be linked to the fact that the government largely coordinates healthcare delivery and controls the data system, limiting any kind of innovation and experimentation that may be expected from the private sector. Even when such experimentation exists, the data may be unused and/or inaccessible, especially since the new data sources are not purposely collected by the government system.

Despite the digital and technological advancement, the use of new data sources in Brazil and India is limited. Our study corroborates previous findings that the use of new data sources to study mental health disorders may be largely limited to advanced economies such as the United States, Canada, and Western Europe [45].

Limitations

Our study has limitations. First, our assessment is based on a non-exhaustive search of the publicly available peer-reviewed publications and gray literature on depression and HEALTHY determinants in India and Brazil. This means that studies and reports that have not been disseminated are not included, which is likely to be the case with the private sector for reasons outlined earlier. As a result, what this study captures in terms of the actual use of new data sources may not be representative. Second, although the mapping exercise entailed first scanning many different studies before examining those studies in closer detail and choosing for final analysis, we did not count or otherwise keep track of the studies at various stages of reading the literature. Third, although the majority of the literature we looked were peer-reviewed journal articles, we did not examine the quality of the publications.

Conclusions

Our results suggest a number of data-related challenges for decision-making on depression, as they relate to the HEALTHY social determinants described herein. First, there is a continuing gap in the collection and analysis of data on depression, as a possible consequence and driver of limited priority accorded to mental health. When data is collected on depression and its HEALTHY determinants, it is often not purposeful, but merely collected as a component of the mental health. In the case of Brazil, data technology developments have increased purposeful collection of data on depression from health facilities, but only in a fraction of municipalities. Since segments of the population with low resources and/or living in areas with low resources also have less access to health care (for example, the slum populations), data may be lacking where it is needed the most.

Second, the relatively limited use of data to inform our understanding of the HEALTHY determinants of depression suggests a substantial need for support of independent research using new data sources and the creation of conditions for collecting and validating official data using new sources. This has been tried in some instances, with good success. In India, for example, having accredited social health activists (ASHAs) collect sociodemographic data on depression related to stressors and personal and family history shows that technology is not the only agent of the data revolution where governments need to focus. [46, 47]

Third, our finding that there is limited use of data sources on depression and its HEALTHY determinants relates directly to the nature of the problem, in particular, to the fact that depression is a very specific health condition. By definition, specific health conditions require specific and purposeful approaches, ones that are characterized by granularity, to collecting data. Handling such data is more resource intensive.

Fourth, the links between stressor events and depression and between depression and suicide call for more interest and rapid investment in expanding mental health care provision in low-resourced areas. An example of the opportunities that multi-disciplinary research can provide was the use of satellite imagery, where satellite images helped volunteers in identifying health and safety needs of flood victims.

Fifth, the global health community should modify frameworks that are being used to track progress on universal health coverage (UHC) to include indicators to capture MH disorders such as depression. Currently, there are two main indices that are used to track progress on the UHC: (a) the UHC index and (b) the effective UHC coverage index. Neither of them incorporates indicators on MH disorders. While “paucity of routinely collected data on mental health” may be the reason for their exclusion, such exclusion postpones addressing data challenges associated with MH disorders [48]. The absence of these indicators in the UHC indices are problematic, especially given the high disease burden associated with depression and other MH disorders.

Finally, it is worth noting that data, by highlighting the health system gaps and needs, will justify an increased pressure for more infrastructure and personnel to provide adequate health services. At the same time, additional data will improve the ability to prevent depression, not only through the health system but also by addressing the social determinants that are influencing depression. This suggests the imperative of a comprehensive investment on data to understand the determinants of depression and on system and infrastructural investments. Pushing a data agenda forward may ultimately lead to improving living conditions as one of the potential solutions to alleviate the burden of depression.