Introduction

Reliable estimates of childhood mortality rates are crucial to evaluating progress in achieving the United Nations (UN) Sustainable Development Goals (SDGs). However, for a large number of countries without well-functioning vital registration systems, limited data availability and data quality issues remain challenges for estimating childhood mortality rates (Alkema & New, 2014). Since the mid-1980s, the Demographic and Health Surveys (DHS) Program has conducted hundreds of national cross-sectional surveys that measure health and demographic indicators. Among others, DHS surveys collect data on retrospective birth histories to calculate childhood mortality rates, such as the neonatal mortality rate (NNMR), the infant mortality rate (IMR), and the under-5 mortality rate (U5MR) and release the rates in the final report for the respective survey. Although the calculation of DHS childhood mortality rates is based on a defined reference period, which usually covers 5 years preceding the interview date, every interviewed woman reports data about all her live-born births, including month and year of birth, current survival status, the current age of surviving children, and age at death of each deceased child (Croft et al., 2018). Given the overlap in birth histories across successive surveys, different surveys collect data about birth histories of the same calendar years. Unfortunately, such overlap is not fully utilized. For example, in Malawi DHS 2015–2016, the published rates in the DHS final report, such as the childhood mortality rates based on 10–14 years before the survey, make use of birth histories of 2003 collected in that survey but make no use of birth histories of the same year collected in Malawi DHS 2010. Moreover, data about birth histories are collected in other reliable surveys, such as the UNICEF’S Multiple Cluster Indicator Surveys (MICS), and used to produce survey-level indicators. For example, data about births and deceased children before 2014 are collected in both Malawi MICS 2013–2014 and Malawi DHS 2015–2016 (National Statistical Office, 2015; National Statistical Office & ICF, 2017; National Statistical Office and ICF Macro, 2011). Such overlap in birth histories across surveys, within DHS surveys, or between DHS and MICS surveys, can be exploited for the benefit of constructing childhood mortality indicators and allows for calculating childhood mortality rates based on calendar years and reference periods beyond the period in the DHS final reports (Pedersen & Liu, 2012).

We address this issue by providing a design-based framework that guides the process of accumulating birth histories from cross-sectional survey data and demonstrating that the resulting cumulated dataset can be used for different purposes such as producing (1) national and sub-national mortality rates with better precision, i.e., lower standard errors, (2) mortality rates based on shorter reference periods, and (3) smoother trends for mortality rates. This, in turn, has the potential to benefit model-based approaches like the Bayesian B-spline Bias-reduction model of the United Nations Inter-agency Group for Child Mortality Estimation (UN IGME). The UN IGME aims to address limited data availability and data quality issues by collecting childhood mortality rates from various sources, including estimates from single DHS and MICS surveys. They use collected rates as input data and smooth over space and time to re-estimate national childhood mortality rates with more reliable time trends for 194 countries (UN IGME, 2012; Alkema & New, 2014).Footnote 1 Hence, our design-based approach can benefit model-based approaches (1) by producing mortality rates with lower standard errors; such rates will be reliable inputs for the model-based approaches, and (2) by avoiding the risk of over-smoothing which might happen due to smoothing the trend over space and time leading to obscure deviations from the underlying trend of a mortality rate. Alkema and New (2014) demonstrate that the former UN IGME approach, a Loess regression model, oversmoothes time series of mortality rates for some countries. They use this to motivate the current UN IGME approach, a Bayesian B-spline Bias-reduction model, which results in time trends that seem to be more appropriate. Nevertheless, the appropriateness of smoothing can never be assessed with certainty. By generating inherently smoothed trends for mortality rates, our approach reduces uncertainty with respect to the right smoothing parameters and the risk of over-smoothing.

Section 2 is a brief background and literature review about calculating childhood mortality rates based on separate survey data and model-based approaches that aim to improve estimates for trends in childhood mortality. In Sect. 3, we describe two different data cumulation techniques and outline a framework for accumulating data of birth history using data from DHS and MICS surveys. After describing the accumulation framework, we illustrate and experiment with several surveys in Sect. 4. Finally, we conclude the paper with a discussion in Sect. 5.

Background and Literature Review

Calculating Childhood Mortality Indicators from Survey Data

Birth histories collected in household surveys are a key component in calculating childhood mortality indicators, especially in countries without well-functioning vital registration (VR) systems. The complexity of the collected data varies according to the survey size and objectives. Data about live births may be collected in full, truncated, or summary birth histories. The data may also be collected as part of other data about all pregnancies in a full pregnancy history (Hill, 2013). The DHS Program and the MICS Program have conducted many national surveys since the mid-1980s (DHS)/mid-1990s (MICS) and are two of the largest household survey programs on children and women (Croft et al., 2018; Khan & Hancioglu, 2019). In most of the DHS and some of the MICS surveys, a full birth history is collected from all women aged 15–49 years.Footnote 2 In the DHS birth histories, each interviewed woman reports data about all her live-born births, starting from the first to the most recent birth. For each birth, the woman reports the sex, month, and year of birth, current survival status (whether the child is still alive), and current age of surviving children in completed years. For deceased children, the mother reports the age at death, detailed in (1) days if the child died before 28 days, (2) months if the child died from 28 days to 23 months, and (3) years if the child died after 23 months (Croft et al., 2018). Unlike full birth histories, there are partial birth histories, such as the truncated birth history and the summary birth history. In the truncated histories, data are collected on births up to a particular date before the survey, or an upper limit for the number of births for each woman. This reduces fieldwork costs and interviewee fatigue. However, the reduction in the collected data might result in a sample of births with characteristics that are drastically different from the ones collected with a full birth history. The DHS Program collects truncated birth histories in many of the Malaria Indicator Surveys. These surveys are not used for estimating childhood mortality indicators (Croft et al., 2018). Similarly in the summary histories, only aggregate numbers of children ever-born and still alive are collected. The summary birth histories are often collected in census forms. Unlike the other birth histories, the summary birth history does not allow for direct calculation of standard mortality indicators, such as occurrence-exposure rates. Indirect estimation methods can be used instead (Hill, 2013; United Nations, 2011).

Collecting retrospective full birth histories in household surveys allows for a direct estimation of childhood mortality indicators. The direct estimation method is commonly used in the DHS and some of the MICS surveys,Footnote 3 where the childhood mortality is calculated using a synthetic cohort life table approach, in which mortality probabilities for small age segments are combined into more common age segments. In DHS surveys, five childhood mortality rates are calculated as probabilities of dying during specific age: (1) the neonatal mortality rate (NNMR): the probability of dying between birth and exact age of 1 month; (2) the post neonatal mortality rate (PNMR): the probability of dying between exact ages 1 month and 1 year, which is usually calculated as the difference between the infant mortality rate (IMR) and NNMR; (3) the infant mortality rate (IMR): the probability of dying between birth and exact age of 1 year; (4) the child mortality rate (CMR): the probability of dying between exact ages 1 and 5 years; and (5) the under-5 mortality rate (U5MR): the probability of dying between birth and the exact age of 5 years (Croft et al., 2018; Elkasabi, 2019; United Nations, 2011). See Appendix 1 for a detailed description of the underlying calculations.

The DHS tradition has been to report the mortality rates based on a window of time that ends with the date of the interview. The justification for this practice is that the number of births in the calendar year of the interview can be small and statistically unstable if the interviews are done early in the year. Similarly, to avoid statistically unstable single-year rates, the DHS reports the national and sub-national mortality rates based on a reference period of 5 (0–4 years) and 10 years (0–9 years) before the survey respectively (Pullum & Assaf, 2016). For example, “the past five years” or “0–4 [completed] years before the survey” means that the window is the 60 months before the month of the interview. For a survey where interviews took place in June 2019, the time interval for national mortality rates will be June 2014 through May 2019, inclusive, and the time interval for sub-national mortality rates will be June 2009 through May 2019, inclusive. The longer reference period of the sub-national rates makes up for the reduction in sample size and allows for longer birth history data to contribute to the regional rates, which results in lower standard errors (SE) and narrower confidence intervals around the rates.Footnote 4

Figure 1 reveals that, albeit long reference periods, there are still substantial differences in estimated mortality rates with respect to survey yearFootnote 5 and survey type.Footnote 6 This problem is not limited to mortality rates calculated using DHS and MICS survey data but applies to a lot of mortality rates calculated using data sources other than well-functioning VR systems (Alkema & Alexander, 2018; Alkema & New, 2012, 2014; Alkema et al., 2012). Such differences in estimated mortality rates with respect to the survey year and survey type can be explained by the retrospective nature of survey data. That is, an interviewed woman could face difficulties in recalling birth histories of children born long ago. In this respect, birth history data may suffer from different variants of the recall bias such as omissions of life births and deaths and misreporting of deaths (Hill & Choi, 2006; Neal, 2012). For instance, omission of deaths may occur for early neonatal deaths being misclassified as stillbirths (Neal, 2012). In addition, the preference for reporting deaths at a particular date may bias estimated mortality rates. Lyon-Amos and Stones (2017) analyze age heaping over time in 34 countries in Sub-Saharan Africa between 1987 and 2015. They find that while there was overall no significant reduction in age heaping over time, the proportion of age reported which are heaped is relatively low and should not be a major concern. This is in line with Prieto et al. (2021) who find that the age heaping at 12 months induced bias in DHS-based IMR estimates is smaller than previously thought. They admit, however that their model cannot be used to produce empirically supported IMR adjustments for Sub-Saharan Africa and South Asia and call for future research that allows understanding the distinct age patterns of under-five mortality in Sub-Sharan African and South Asian countries. Apart from the recall bias, other non-sampling errors, such as nonresponse or non-coverage errors, may cause differences in estimated mortality rates. However, we are not aware of any evidence that these errors affected the DHS and MICS surveys used in our study more than other surveys.

Fig. 1
figure 1

U5MR, IMR, and NNMR for Senegal and Malawi calculated for a 5-year reference period based on birth history data from individual DHS and MICS surveys

Model-Based Approaches to Estimate Mortality Rates

The shortcomings of mortality rates calculated from single data sources motivate existing model-based approaches that incorporate various data sources to improve estimates for national and sub-national mortality rates. The Institute of Health Metrics and Evaluation (IHME) relies on a compiled database of 16,174 measurements of under-5 mortality and uses Gaussian process regression modeling to estimate the U5MR for 187 countries from 1970 to 2009 (Rajaratnam et al., 2010; Wang et al., 2012). Similar to the IHME, the United Nations Inter-agency Group for Child Mortality Estimation (UN IGME) incorporates various measurements of children mortality per country from VR and non VR systems in order to estimate mortality rates (UN IGME, 2020a, b). The annual U5MR per country is estimated using a Bayesian B-splines bias adjusted model, which is described in Alkema and New (2014). To account for under-reporting of neonatal deaths in countries without high-quality VR data, the majority of UN IGME IMRs are derived from the respective U5MR incorporating known regularities in age patterns of child mortality (UN IGME, 2020a, b). Finally, the UN IGME NNMRs are estimated using a penalized splines regression model within a Bayesian hierarchical framework as described in Alexander and Alkema (2018).

In addition, the UN IGME publishes estimates for sub-national U5MRs. In contrast to the UN IGME national rates, which incorporate DHS released mortality rates and mortality rates from other sources, UN IGME sub-national rates incorporate DHS data and no other data sources. Those estimates are obtained using a Beta Binomial sampling model that includes cluster-level modeling, space–time smoothing, and benchmarking to UN IGME estimates among others (UN IGME, 2021). Given the concerns about the small sample sizes underlying estimates for sub-national mortality rates, there exist several other model-based approaches to obtain sub-national estimates for U5MRs. Dwyer-Lindgren et al. (2014) exploit multiple data sources and explore various approaches of combining small area models and birth history methods to estimate sub-national U5MRs. In addition, they use simulation to validate their different approaches and apply the best performing method to birth history data in Zambia. Mercer et al. (2015) address issues of not incorporating area-specific sampling variability when estimating sub-national U5MRs by constructing sub-national estimates for the U5MR in Tanzania, accounting for survey weighting and smoothing over space and time. They incorporate estimates of under-5 mortality from five DHS surveys, one Malaria Indicator survey, and two health and demographic surveillance system sites in Tanzania. Li et al. (2019) combine direct estimates of the U5MR for sub-national areas from 122 DHS surveys in 35 countries in Africa into a single estimate accounting for the sample design of each survey and smoothing each region-time-specific estimate using a Bayesian space–time model. To sum things up, model-based approaches generally combine existing mortality rates estimates and smooth over space and time to re-estimate a final mortality rate. Hence, they can be improved by increasing the precision of mortality rates that enter their estimation. For that reason, we introduce a framework that guides the process of cumulating cross-sectional survey data. Our method can be applied to DHS and MICS data and we demonstrate that our design-based approach helps to improve data precision by producing mortality rates with narrower confidence intervals and smoother time trends.

Methods and Data

Techniques for Cumulating Periodic Surveys

During the past 35 years, many countries collected more than 5 DHS surveys, which were typically 5 years apart. This includes, but is not limited to, Bangladesh, Colombia, Dominican Republic, Egypt, Ghana, Indonesia, Jordan, Kenya, Mali, Nigeria, Philippines, Senegal, Tanzania, Uganda, Zambia, and Zimbabwe (https://dhsprogram.com/). Such a long series of surveys allow researchers to study the change in survey estimates across years. Traditionally, the DHS surveys have been designed and used for measuring periodic changes across surveys in addition to “current” estimates. The development of DHS surveys concentrated on tracking survey estimates across surveys within and between countries and has encouraged the development of a harmonized set of survey tools and procedures for all survey stages. The harmonized toolkit includes standard sampling designs, questionnaires, protocols for anthropometric measurements and blood testing, as well as algorithms for survey estimates and tabulation plans. Most of the tools available on the DHS Program website (https://dhsprogram.com/) have been used extensively to collect surveys other than the DHS surveys. For example, the DHS program produced hundreds of comparative reports in which many indicators were tracked across years or between countries. Such harmonization allows for the combination of DHS surveys with other national surveys that collect similar data, such as some of the UNICEF’s Multiple Indicator Cluster Survey (MICS).

Cumulating data across several surveys achieves many goals, such as (1) increasing sample size, which produces reliable estimates with reduced estimated variance on the national level, or on small geographic and non-geographic domains; (2) estimating change across different surveys (Kalton, 2009; Robert & Binder, 2009; Thomas & Wannell, 2009). The cumulating technique has also been used to combine different samples in multiple frame designs to overcome frame non-coverage problems (Elkasabi, 2015; Elkasabi et al., 2015; Lohr, 2011). Furthermore, it has been used to combine rolling samples of the same survey. For example, the Senegal DHS 2012–2014 is based on a cumulated sample from the Senegal DHS 2012–2013 and the Senegal DHS 2014 (Agence Nationale de la Statistique et de la Démographie and ICF International 2015).

Kish (1999) defined “cumulating periodic surveys” as the process of cumulating estimates across periodic surveys. Unlike the traditional use of periodic surveys to produce “current” estimates or to measure periodic changes, surveys are cumulated and combined to produce cumulated estimates. Two broad approaches can be used to cumulate data across surveys: (1) The separate approach, in which the cumulating is done at the estimates level, where periodic estimates are cumulated across surveysFootnote 7; (2) The pooled approach, in which microdata are cumulated across surveys. With the separate approach, periodic estimates are produced separately from each survey and then combined in a composite estimator. With the pooled approach, microdata from different surveys are cumulated in one “pooled” dataset, and estimates are calculated based on that pooled dataset (Thomas & Wannell, 2009).

The Separate Approach

In the separate approach, cumulated estimates can be calculated as a simple or weighted average of periodic estimates. Let Y denote a population parameter and \({\widehat{y}}_{1}, {\widehat{y}}_{2},\dots , {\widehat{y}}_{I}\) denote unbiased estimates of Y from I periodic surveys. A cumulated estimate \({\widehat{y}}_{c}\) can be calculated as:

$${\widehat{y}}_{c}=\sum_{i=1}^{I}{\widehat{\theta }}_{i}{\widehat{y}}_{i},$$
(1)

where \({\widehat{\theta }}_{i}\) is a cumulating composite factor and \(\sum_{i=1}^{I}{\widehat{\theta }}_{i}=1\). The cumulated estimate in (1) can also be calculated as a simple average \({\widehat{y}}_{c}=\sum_{i=1}^{I}{\widehat{y}}_{i}/I\) where \({\widehat{\theta }}_{i}=1/I\). Where samples of periodic surveys are independent, the variance of the cumulated estimate can be calculated as:

$$var\left({\widehat{y}}_{c}\right)=\sum_{i=1}^{I}{\widehat{\theta }}_{i}^{2}var\left({\widehat{y}}_{i}\right).$$
(2)

The main advantage of the separate approach is that it can be easily used to calculate cumulated estimates for simple indicators, such as proportions, means, and totals, without having to involve the survey microdata. The approach might be appealing to users who rely on published tables to calculate cumulated estimates. However, the lack of published estimators or their confidence intervals might be a challenge in producing cumulated estimates with estimated confidence intervals. In addition, as explained in Sect. 2.2 estimating before cumulating introduces additional uncertainty to the final estimate, which may result in wider confidence intervals associated with the final estimate.

The Pooled Approach

Unlike the separate approach, in the pooled approach, individual records, and not estimates, are cumulated across different surveys. Pooled/cumulated estimates are produced by using the same techniques appropriate for single samples. Let yij denote the individual record of y for individual j of survey i and let wij denote the survey weight for the same individual. A cumulated estimate \({\widehat{y}}_{c}\) can be calculated as follows:

$${\widehat{y}}_{c}=\sum_{i}\sum_{j\in i}{\widehat{\theta }}_{i}{w}_{ij}{y}_{ij},$$
(3)

where \({\widehat{\theta }}_{i}\) is a cumulating adjustment factor that is fixed across all individual records of survey i, with \(\sum_{i=1}^{I}{\widehat{\theta }}_{i}=1\). In (3), the individual survey weight \({w}_{ij}\) accounts for differences in the probability of selection and interview between individuals within surveys, whereas the cumulating adjustment factor \({\widehat{\theta }}_{i}\) adjusts the weight to account for multiplicity across surveys, i.e., target women have multiple opportunities to be selected as sample element across surveys. Different methods can be used to choose the values of the composite factor \({\widehat{\theta }}_{i}\), such as to minimize the variance of the cumulated estimate in (2) or to account for the effective sample size for each of the periodic samples (Chu et al., 1999; Korn & Graubard, 2011; Roberts & Binder, 2009). These methods are motivated by estimation methods developed for surveys with multiple frame designs (Elkasabi et al., 2015; Lohr, 2011; Skinner & Rao, 1996). We chose to use one of the common multiple frame estimators: the multiplicity estimator. That is, a fixed composite factor, such as \({\widehat{\theta }}_{i}=1/I\),Footnote 8 is assigned across different samples/surveys (Mecatti, 2007; Singh & Mecatti, 2011). Other methods for finding the composite factors, such as the Hartley Estimator, the Pseudo-Maximum Likelihood Estimator, or the Pseudo-empirical Likelihood Estimator, may have better theoretical properties. However, these methods are very complex, and using them may cause methodological and practical problems, especially for accumulating multiple surveys. Therefore, we use the multiplicity estimator which is heavily used in practice and can be readily calculated in our application (Lohr, 2011).

With data cumulated at the microdata level, the pooled/cumulated dataset can be analyzed as a single sample from a population, where all single-sample techniques of data analysis are relevant. This allows for the estimation of complicated indicators other than proportions, means, or totals, such as childhood mortality rates. In addition, it requires access to the microdata of the separate surveys by data users who can manipulate datasets to properly cumulate several surveys. Appendix 2 provides more illustrations on the differences between the separate and pooled approaches with application on the NNMR.

Data and Application

Two countries were selected for the analysis in this paper: Senegal and Malawi. In Senegal, we focused on DHS surveys collected after 2010 due to the large number of DHS surveys collected during the past 10 years. As indicated in Table 1, six DHS surveys were completed in Senegal between 2010 and 2017, and four DHS surveys and two MICS surveys were completed in Malawi between 2000 and 2015. For each DHS survey, the births dataset (BR) was used in the data analysis, whereas for the Malawi MICS survey, the birth history (bh) dataset was used in the data analysis. All analysis datasets can be downloaded from the DHS and MICS webpages (https://dhsprogram.com/; https://mics.unicef.org/surveys).

Table 1 The survey and number of women age 15–49 (n) during the study interval (2000–2017 for Senegal; 2000–2015 for Malawi)

The birth history data from the individual survey datasets overlap as indicated in Fig. 2 for Senegal. All six surveys collected data about children’s deaths and exposure from 2000 through 2011. More specifically, as indicated in Fig. 3, all surveys share births and deaths that occurred between 1982 and 2011. For example, childhood mortality rates are typically calculated from the DHS 2010–2011 based on a 5-year reference period, highlighted in yellow in Fig. 2. With birth history data from other surveys, highlighted in yellow, separate childhood mortality rates can be calculated for 2010 with respect to survey, or birth history data can be cumulated and used to produce one cumulated rate from the cumulated dataset.

Fig. 2
figure 2

Overlap for 5-year reference periods by DHS surveys in Senegal

Fig. 3
figure 3

Ranges of dates of birth and death by DHS surveys in Senegal

In each country, one cumulated dataset was formed by pooling the separate datasets as follows:

  1. 1.

    All separate datasets were pooled in one cumulated dataset.

  2. 2.

    A survey ID variable was constructed to identify each survey within the cumulated dataset.

  3. 3.

    Unique cluster and strata codes were constructed so that clusters and strata from each survey are unique across surveys within the cumulated dataset.

  4. 4.

    A modified survey weight was created by multiplying the survey weight (v005/1,000,000)Footnote 9 from each survey by an un-normalization factorFootnote 10 that was calculated as the approximated number of women age 15–49 in the population (https://data.worldbank.org/), during the time of the survey, divided by the total number of women age 15–49 interviewed in the survey.

    $${f}_{i}=\frac{{N}_{i}^{w}}{{n}_{i}^{w}}$$
    (4)

    with fi being the un-normalization factor for survey \(i\), \({N}_{i}^{w}\) being the population of women age 15–49 years in the country at the time of survey \(i\), and \({n}_{i}^{w}\) being the number of interviewed women age 15–49 years in survey \(i\).Footnote 11

  5. 5.

    A fixed adjustment factor (\(1/I\)) was assigned to survey weight so that a modified survey weight for survey i is

    $${w}_{i}=\frac{{v005}_{i}}{{1{,}000{,}000}}\frac{{N}_{i}^{w}}{{n}_{i}^{w}}\frac{1}{I}.$$
    (5)
  6. 6.

    For each age segment k (see Table 2 for the definition of the segments), a component death probability pk was calculated with the modified weight (5) yielding

    $${p}_{k}=\frac{{\sum }_{i=1}^{I}({D}_{k2,i}+0.5({D}_{k1,i}+{D}_{k3,i}))}{{\sum }_{i=1}^{I}({E}_{k2,i}+0.5({E}_{k1,i}+{E}_{k3,i}))},$$
    (6)

    where Dk1,i denotes the weighted number of deaths for children in age segment \(k\) of cohort 1 in survey i, and Ek1,i denotes the weighted number of survivors for children at the beginning (\({{\varvec{a}}}_{{\varvec{l}}}\)) of age segment \(k\) of cohort 1 in survey i. Similar definitions apply in case of Dk2,i, Dk3,i, Ek2,i and Ek3,i for cohorts 2 and 3. (See Appendix 1 for definitions of cohorts). All weighted estimates were produced using the modified weight in (5). The mortality rates were then calculated as follows:

    $${\rm NNMR}={p}_{1}\times 1000,$$
    (7)
    $${\rm IMR}=\left(1-\prod_{k\in \left\{{1,2},{3,4}\right\}}\left(1-{p}_{k}\right)\right)\times 1000,$$
    (8)
    $${\rm U}5{\rm MR}=\left(1-\prod_{k\in \left\{{1,2},{3,4},{5,6},{7,8}\right\}}\left(1-{p}_{k}\right)\right)\times 1000.$$
    (9)
  7. 7.

    The unique cluster, strata, and modified weight were used for the analysis in this paper.

Table 2 The bounds of the age segments from \({{\varvec{a}}}_{{\varvec{l}}}\) up to but not including \({{\varvec{a}}}_{{\varvec{u}}}\) for the component death probabilities

Unlike the rates published in the DHS reports, rates are calculated in this analysis based on calendar years. For example, 5-year mortality rates for 2016 are based on deaths and exposure from January 2012 through December 2016 inclusive. In addition to the mortality rates based on cumulated survey data, we do also calculate mortality rates for every individual survey following the DHS standard procedures as described in Croft et al. (2018) and Elkasabi (2019). Therefore, the chmort function from the DHS.rates R package was used to calculate all cumulated and separate rates and their standard errors in this analysis. The function follows the same calculation approach used in the DHS surveys, which is outlined in Appendix 1 (Elkasabi, 2019, 2020).

Results

Childhood Mortality Rates: National Level

Figure 4 reports the national NNMR, IMR, and U5MR of Senegal calculated with separate DHS surveys and cumulated survey data. The separate rates, in the upper panel of the figure, are the rates released in the DHS survey reports of DHS 2010–2011, 2012–2013, 2014, 2015, 2016, and 2017. From the cumulated data, annual rates based on reference periods of 5 calendar years and one calendar year were calculated between 2000 and 2016 and presented in the bottom two panels of the figure. In comparison to the DHS released rates, cumulated birth histories across surveys allow for longer time series of childhood mortality trends, such as 2000–2016 or an even longer period due to the complete birth histories collected in the DHS surveys, and trends of annual rates based on reference periods of 5 and 1 calendar years. For example, the DHS trends indicate a decrease in childhood mortality rates between 2013 and 2014 and an increase between 2014 and 2015. The cumulated trend based on a 5-year reference period flattens such fluctuations and allowes for narrower confidence intervals around the trend curve. As Table 3 indicates, cumulated rates are accompanied by standard errors that are an estimated 60% less than their counterparts from separate surveys. In addition, Fig. 4 reveals that cumulating birth histories across surveys allows for the calculation of mortality rates based on shorter reference periods, such as one calendar year, with acceptable confidence intervals, especially for the U5MR.

Fig. 4
figure 4

Trends of childhood mortality rates in Senegal based on separate surveys and cumulated data

Table 3 Average standard errors of mortality rates by survey in Senegal

Figure 5 reports trends of annual childhood mortality rates calculated on a reference period of 5 calendar years. Trends were calculated based on separate survey datasets and a cumulated dataset. Cumulated data yielded longer, smoother trends with more narrow confidence intervals as opposed to the trends in the separate datasets. Unlike the trends from the separate datasets where the fluctuating rates might not be sufficient to infer a consistent pattern, trends from the cumulated dataset were more robust with a definite pattern that implies a decrease in mortality rates over time. As shown in Fig. 5, trends calculated from the separate datasets might contradict each other. For example, the NNMR and IMR calculated with the DHS 2015 survey data are subject to an upward sloping trend between 2000 and 2004. In contrast, calculating the NNMR and IMR with DHS 2014 survey data shows a decreasing trend between 2000 and 2004. Such inconsistencies might not suggest any significant differences across surveys, especially with the wide confidence intervals around the trends. Similar patterns are obvious in Figs. 11 and 12 in appendix where birth history data from Malawi DHS surveys(DHS surveys from 2000, 2004, 2010, and 2015) are cumulated with birth history data from Malawi MICS surveys (MICS surveys from 2006 and 2013).

Fig. 5
figure 5

Trends of childhood mortality rates for a 5-year reference period in Senegal based on separate surveys and cumulated data

In addition, Fig. 6 compares childhood mortality rates to their UN IGME counterparts in Senegal (downloaded from UN IGME, 2020b). For the U5MR the cumulated time trend is consistently characterized by a narrower confidence interval and yields similar results to the model-based approach of the UN IGME for the years 2000 and 2012–2015. For the years 2001–2011 the cumulated U5MR is slightly above the UN IGME U5MR, which might be partly driven by the fact that the cumulated U5MR is based on a 5-year reference period, i.e., represents a 5-year moving average. The confidence intervals for the cumulated IMR and the cumulated NNMR are a bit wider at the beginning of the time series (around 2000) and a bit narrower at the end of the time series (around 2016) compared to their UN IGME counterparts. Furthermore, the difference between the cumulated IMR/NNMR and the UN IGME IMR/NNMR is larger than the difference between the U5MRs. This can be related to the increased difficulty of producing reliable estimates for the IMR and the NNMR. Furthermore, trends for the cumulated IMR and NNMR are not as smooth as the UN IGME trends. Whether the cumulated trends under-smooth or the UN IGME trends over-smooth the true trend remains an open question. Similar patterns are obvious in Fig. 13 where cumulated estimates for Malawi are compared to their UN IGME counterparts.

Fig. 6
figure 6

Trends of childhood mortality rates for a 5-year reference period in Senegal based on cumulated data compared to their UN IGME counterparts

Childhood Mortality Rates: Sub-national Level

Figure 7 reports trends for regional NNMR, IMR, and U5MR of the following three regions in Senegal: Kaolack, Kedougou, and Sedhiou. Rates are calculated based on separate survey datasets and a cumulated dataset. The separate survey rates, in the upper panel of the figure, are the rates calculated based on a reference period of 10 years before the survey and released in the DHS survey reports of DHS 2010–2011, 2012–2013, 2014, 2015, 2016, and 2017. The trends from the cumulated dataset are based on annual rates calculated based on two reference periods, 10 and 5 calendar years. Similar to the national trends, sub-national trends from the cumulated dataset are longer and suffer from less fluctuation than trends from the separate datasets released by the DHS. The cumulation also allowed for rates based on a shorter reference period of 5 years as opposed to the standard 10-year period for sub-national rates. The cumulated sub-national rates are accompanied by standard errors that are less than or comparable to their counterparts from separate surveys, in case of the 10-year and 5-year rates, respectively.

Fig. 7
figure 7

Trends of childhood mortality rates for selected regions in Senegal based on separate surveys and cumulated data

Figure 8 reports trends of annual childhood mortality rates in the Kolda Region of Senegal, which were calculated based on a reference period of 5 calendar years. The trends are for rates calculated from separate survey datasets and a cumulated survey dataset. As shown in Fig. 8, cumulated data yields longer, smoother trends with more narrow confidence intervals. Similar patterns are obvious in Figs. 14 and 15 where birth history data from Malawi DHS surveys (DHS surveys from 2000, 2004, 2010, and 2015) are cumulated with birth history data from Malawi MICS surveys (MICS surveys from 2006 and 2013).

Fig. 8
figure 8

Trends of childhood mortality rates for a 5-year reference period in Kolda, Senegal based on separate surveys and cumulated data

In addition, Fig. 9 compares sub-national U5MRs estimated based on the cumulated dataset to their UN IGME counterparts in Senegal (downloaded from UN IGME, 2020b). For all regions, the cumulated time trends are almost consistently characterized by a wider confidence interval than the UN IGME time trends. Similar patterns are obvious in Figs. 16 and 17 where cumulated sub-national estimates for Malawi are compared to their UN IGME counterparts. In addition, the cumulated U5MRs are larger than their UN IGME counterparts for nearly every time point and region, but almost all of the differences between the cumulated U5MRs and their countrparts from UN IGME are not statistically significant (note the overlapping confidence intervals). This is special to sub-national mortality rates in Senegal, as Figs. 16 and 17 for Malawi do not display any systematic differences between cumulated and UN IGME U5MRs. Furthermore, in most regions, trends for the U5MR calculated using cumulated data are not as smooth as their UN IGME counterparts. Whether the cumulated trends under-smooth or the UN IGME trends over-smooth the true trend remains an open question. However, we believe that Fig. 9, as well as Figs. 16 and 17, indicate at least some degree of over-smoothing for UN IGME sub-national rates.

Fig. 9
figure 9

Trends of sub-national U5MRs for a 5-year reference period and all regions in Senegal based on cumulated data compared to their UN IGME counterparts

Discussion

In this paper, we exploited the overlap in birth histories across different DHS/MICS surveys. We proposed a design-based approach to accumulate birth histories across surveys. This work aims at introducing a complement to many other model-based approaches for estimating childhood mortality indicators. We focus on three rates that are essential for monitoring progress in meeting the UN SDGs: the NNMR, IMR, and U5MR. Based on DHS survey data for Senegal and DHS and MICS survey data for Malawi, we demonstrate that national and sub-national trends based on mortality rates calculated with single survey data may suffer from fluctuations and might contradict each other when calculated from separate surveys. In addition, we argue that focusing on such rates instead of focusing on rates calculated using cumulated data adds avoidable imprecision to the model-based estimates of the UN IGME.

We showed that using the proposed framework for accumulating birth histories across different surveys decreases the confidence intervals around the estimated mortality rates. Estimated standard errors of cumulated rates are 50–60% less than their counterparts from separate surveys. In addition, accumulating birth histories smoothens fluctuations in time series for national and sub-national mortality rates and establishes more stable, reliable time trends. It is important to acknowledge that the reliability of the cumulated time trend varies between national and sub-national rates and with the reference period. Cumulated time trends for national rates based on a 5-year reference period and sub-national rates based on a 10-year reference period appear to be very reliable when compared with cumulated rates based on a 1-year reference period or a 5-year reference period for national or sub-national levels, respectively. This highlights the importance of treating the proposed framework as a complement and not as an alternative to existing model-based approaches, especially for NNMRs and sub-national mortality rates, which are generally more challenging to estimate. We believe that our framework can help improve existing estimates of childhood mortality rates, as it increases the precision of estimates that enter model-based approaches. Given the advantages, it is also important to keep in mind that our approach does not resolve data quality issues in survey data, such as non-sampling errors, e.g., recall bias that may affect the birth history data. Therefore, our method can be improved by checking the datasets for any potential non-sampling errors before cumulating data across different surveys. There is also room for improving the proposed framework by exploring other approaches for accumulating data and comparing different approaches to find an optimal value for the cumulating composite factor. Finally, improving the methods applied to estimate childhood mortality rates should only be a short-term goal while implementing well-functioning VR systems should be the ambitious long-term goal to gain more certainty about childhood mortality rates.