Main

Gains in child survival have long served as an important proxy measure for improvements in overall population health and development1,2. Global progress in reducing child deaths has been heralded as one of the greatest success stories of global health3. The annual global number of deaths of children under 5 years of age (under 5)4 has declined from 19.6 million in 1950 to 5.4 million in 2017. Nevertheless, these advances in child survival have been far from universally achieved, particularly in low- and middle-income countries (LMICs)4. Previous subnational child mortality assessments at the first (that is, states or provinces) or second (that is, districts or counties) administrative level indicate that extensive geographical inequalities persist5,6,7.

Progress in child survival also diverges across age groups4. Global reductions in mortality rates of children under 5—that is, the under-5 mortality rate (U5MR)—among post-neonatal age groups are greater than those for mortality of neonates (0–28 days)4,8. It is relatively unclear how these age patterns are shifting at a more local scale, posing challenges to ensuring child survival. To pursue the ambitious Sustainable Development Goal (SDG) of the United Nations9 to “end preventable deaths of newborns and children under 5” by 2030, it is vital for decision-makers at all levels to better understand where, and at what ages, child survival remains most tenuous.

Precision public health and child mortality

Country-level estimates facilitate international comparisons but mask important geographical heterogeneity. Previous assessments of mortality of children under 5 have noted significant within-country heterogeneity, particularly in sub-Saharan Africa5,7,10,11,12,13,14, as well as in Brazil15, Iran16 and China17. Understanding public health risks at more granular subpopulation levels is central to the emerging concept of precision public health18, which uses “the best available data to target more effectively and efficiently interventions…to those most in need”18. Efforts to produce high-resolution estimates of mortality of children under 5, determinants at scales that cover the multiple countries are emerging, including for vaccine coverage19,20, malaria21, diarrhoea22 and child growth failure23,24. In a previous study, we produced comprehensive estimates of African child mortality rates at a 5 × 5-km scale for 5-year intervals5. For areas outside of Africa, in which 72% of the world’s children live and 46% of global child deaths occurred in 20174, subnational heterogeneity remains mostly undescribed25.

Here we produce estimates of death counts and mortality rates of children under 5, infants (under 1 years of age) and neonates (0–28 days) in 99 countries at policy-relevant subnational scales (first and second administrative levels) for each year from 2000 to 2017. We fit a geostatistical discrete hazards model to a large dataset that is composed of 467 geo-referenced household surveys and censuses, representing approximately 15.9 million births and 1.1 million deaths of children from 2000 to 2017. Our model includes socioeconomic, environmental and health-related spatial covariates with known associations to child mortality and uses a Gaussian process random effect to exploit the correlation between data points near each other across dimensions of space, time and age group, which helps to mitigate the limitations associated with data sparsity in our estimations. For this study, we report U5MR as the expected number of deaths per 1,000 live births, reflecting the probability of dying before the age of 5 for a given location and year.

Unequal rates of child mortality

The risk of a newborn dying before their fifth birthday varies tremendously based on where in the world, and within their country, they are born. Across the 99 countries in this study, we estimate that U5MR varied as much as 24-fold at the national level in 2017, with the highest rate in the Central African Republic of 123.9 deaths (95% uncertainty interval, 104.9–148.2) per 1,000 live births, and the lowest rate in Cuba of 5.1 deaths (4.4–6.0)4. We observed large subnational variation within countries in which overall U5MR was either high or comparatively low. For example, in Vietnam, rates across second administrative units (henceforth referred to as ‘units’) varied 5.7-fold, from 6.9 (4.6–9.8) in the Tenth District in Hồ Chí Minh City to 39.7 (28.1–55.6) in Mường Tè District in the Northwest region (Figs. 1b, 2).

Fig. 1: U5MR estimates in 99 LMICs.
figure 1

a, U5MR at the second administrative level in 2000. b, U5MR at the second administrative level in 2017. c, Modelled posterior exceedance probability that a given second administrative unit had achieved the SDG 3.2 target of 25 deaths per 1,000 live births for children under 5 in 2017. d, Proportion of mortality of children under 5 occurring in the neonatal (0–28 days) group at the second administrative level in 2017.

Fig. 2: Geographical inequality in U5MR across 99 countries for 2000 and 2017.
figure 2

a, Absolute inequalities. Range of U5MR estimates in second administrative-level units across 99 LMICs. b, Relative inequalities. Range of ratios of U5MR estimates in second administrative-level units relative to country means. Each dot represents a second administrative-level unit. The lower bound of each bar represents the second administrative-level unit with the lowest U5MR in each country. The upper end of each bar represents the second administrative-level unit with the highest U5MR in each country. Thus, each bar represents the extent of geographical inequality in U5MRs estimated for each country. Bars indicating the range in 2017 are coloured according to their Global Burden of Disease super-region. Grey bars indicate the range in U5MR in 2000. The diamond in each bar represents the median U5MR estimated across second administrative-level units in each country and year. A coloured bar that is shorter than its grey counterpart indicates that geographical inequality has narrowed.

Decreases in U5MR between 2000 and 2017 were evident to some extent throughout all units (Figs. 1a, b, 2). No unit showed a significant increase in U5MR in this period, and in most units U5MR decreased greatly, even in units in which the mortality risk was the highest. Out of 17,554 units, 60.3% (10,585 units) showed a significant (defined as 95% uncertainty intervals that did not overlap) decrease in U5MR between 2000 and 2017. Across units in 2000, U5MR ranged from 7.5 (5.0–10.6) in Santa Clara district, Villa Clara province, Cuba, to 308.4 (274.9–348.4) in the Sabon Birni Local Government Area of Sokoto State, Nigeria. By 2017, the unit with the highest estimated U5MR across all 99 countries was Garki Local Government Area, Jigawa state, Nigeria, at 195.1 (158.6–230.9). Overall, the total percentage of units with a U5MR higher than 80 deaths per 1,000 live births decreased from 28.9% (5,070) of units in 2000 to 7.0% (1,236) in 2017. Furthermore, 32% of units, representing 11.9% of the under-5 population in the 99 countries, had already met SDG 3.2 for U5MR with a 90% certainty threshold (Fig. 1c). For neonatal mortality, 34% of units met the target of ≤12 deaths per 1,000 live births (Extended Data Fig. 1). Within countries, successes were mixed in some cases. For example, Colombia, Guatemala, Libya, Panama, Peru and Vietnam had all achieved SDG 3.2 for U5MR at the national level by 2017, but each country had units that did not achieve the goal with 90% certainty (Fig. 1c).

Successful reductions in child mortality were also observed throughout entire countries. For example, in 43 LMICs across several world regions, the worst-performing unit in 2017 had a U5MR that was lower than the best-performing unit in 2000 (Fig. 2). Nearly half of these countries were in sub-Saharan Africa. Rwanda showed notable progress during the study period, reducing mortality from 144.0 (130.0–161.6) in its best-achieving district in 2000 (Rubavu) to 57.2 (47.4–72.1) in its worst-achieving district in 2017 (Kayonza). These broad reductions in U5MR have also led to a convergence of absolute subnational geographical inequalities, although relative subnational inequalities appear to be mostly unchanged between 2000 and 2017 (Fig. 2 and Supplementary Fig. 6.12). Despite this success, the highest U5MRs in 2017 were still largely concentrated in areas in which rates were highest in 2000 (Fig. 1a, b). We observed estimated U5MR ≥ 80 across large geographical areas in Western and Central sub-Saharan Africa, and within Afghanistan, Cambodia, Haiti, Laos and Myanmar (Fig. 1b).

Deaths of neonates (0–28 days of age) and post-neonates (28–364 days of age) have come to encompass a larger fraction of overall mortality of children under 5 in recent years. By 2017 (Fig. 1d), neonatal mortality increased as a proportion of total deaths of children under 5 in 91% (90) of countries and for 83% (14,656) of units compared to 2000. In almost all places where U5MR decreased, the share of the mortality burden increased in the groups of children with younger ages. Similarly, the mortality of infants (<1 year) has increased relative to the mortality for children who are 1–4 years of age in many areas. For example, in the Diourbel Region, Senegal, infant mortality constituted 54.4% (52.4–56.6) of total mortality of children under 5 in 2000; by 2017, the relative contribution of infant mortality was 73.2% (70.3–75.8). This shift towards mortality predominantly affecting neonates and infants was not as evident in all locations; mortality for children aged 1–4 years was responsible for more than 30% of overall under-5 deaths in 13% (2,226) of units, mostly within high-mortality areas in sub-Saharan Africa.

Distribution of under-5 deaths may not follow rates

The goal of mortality-reduction efforts is ultimately to prevent premature deaths, and not just to reduce mortality rates. Across the countries studied here, there were 3.5 million (41%) fewer deaths of children under 5 in 2017 than in 2000 (5.0 million compared to 8.5 million). At the national level, the largest number of child deaths in 2017 occurred in India (1.04 (0.98–1.10) million), Nigeria (0.79 (0.65–0.96) million), Pakistan (0.34 (0.27–0.41) million) and the Democratic Republic of the Congo (0.25 (0.21–0.31) million) (Fig. 3a). Within these countries, the geographical concentration of the deaths of the children varied. In Pakistan, over 50% of child deaths in 2017 occurred in Punjab province, which had a U5MR of 63.3 (54.1–76.0) deaths per 1,000 live births (Fig. 3b). By contrast, 50% of child deaths in the Democratic Republic of the Congo in 2017 occurred across 9 out of 26 provinces. Such findings are in a large part artefacts of how borders are drawn around various at-risk populations (the provinces above account for 53% and 63%, respectively, of the under-5 population that is at risk in these two countries), but can have a real impact at the level at which planning occurs. Some concentrated areas with apparent high absolute numbers of deaths highlighted by local-level estimates become less noticeable when reporting at aggregated administrative levels; for example, areas across Guatemala, Honduras and El Salvador are visually striking hotspots in Fig. 3d, but less so in Fig. 3b, c.

Fig. 3: Estimated number of children under 5 who died within 99 countries in 2017.
figure 3

a, Number of deaths of children under 5 in each country. b, Number of deaths in each first administrative-level unit. c, Number of deaths in each second administrative-level unit. d, Number of deaths of children under 5 in each 5 × 5-km grid cell. Note that scales vary for each aggregation unit.

Our estimates indicate that targeting areas with a ‘high’ U5MR of 80 will have a lower overall effect than in previous years owing to the reductions in mortality rates. In 2000, 23.7% of child deaths—representing 2.0 (1.7–2.4) million deaths—occurred in regions in which U5MR was less than 80 that year (Fig. 4). By comparison, in 2017, 69.5% of child deaths occurred in areas in which U5MR was below 80. A growing proportion of deaths of children under 5 are occurring in ‘low’-mortality areas; 7.3% (5.1–10.2) of all deaths of children under 5 in 2017 occurred in locations in which the U5MR was below the SDG 3.2 target rate of 25, compared to 1.2% (0.9–1.6) in 2000. For instance, Lima, Peru, has a U5MR in the 8th percentile of units in this study, yet it ranks in the 96th percentile of highest number of deaths of children under 5.

Fig. 4: Number of deaths of children under 5, distributed across level of U5MR, in 2000 and in 2017, across 99 countries.
figure 4

Bar heights represent the total number of deaths of children under 5 within all second administrative-level units with corresponding U5MR. Bins are a width of 5 deaths per 1,000 live births. The colour of each bar represents the global region as defined by the subset legend map. As such, the sum of heights of all bars represents the total number of deaths across the 99 countries. a, Deaths of children under 5 in 2000. b, Deaths of children under 5 in 2017. The dotted line in the 2000 plot is the shape of the distribution in 2017, and the dotted line in the 2017 plot represents the distribution in 2000.

Despite population growth, child deaths have declined due to the outpaced decline in U5MR. For example, there were a total of 8.5 (7.2–10.0) million deaths of children under 5 in the countries in this study in 2000; had the 2017 under-5 population been exposed to the same U5MRs that were observed in 2000, there would have been 10.6 (9.0–12.5) million deaths in 2017. Instead, we observed 5.0 (3.8–6.6) million deaths in 2017 (Extended Data Fig. 5).

Finally, we combine estimates of subnational variation in mortality rates and populations to gain a better understanding of the impact of geographical inequality. Overall, 2.7 (2.5–2.9) million deaths, or 54% of the total number of deaths of children under 5, would have been averted in 2017 had all units had a U5MR that matched the best-performing unit in each respective country (Extended Data Fig. 2). Over the 2000–2017 period, this number is 71.8 (68.5–74.9) million deaths, or 58% (55–61) of the total number of deaths of children under 5. Total deaths attributable to inequality in this scenario ranged from 13 (6–24) deaths in Belize to 0.84 (0.72–0.99) million deaths in India. Furthermore, had all units met the SDG 3.2 target of 25 deaths per 1,000, an estimated 2.6 (2.3–2.8) million deaths of children under 5 would have been averted in 2017.

Discussion

This study offers a comprehensive, geospatially resolved resource for national and subnational estimates of child deaths and mortality rates for 99 LMICs, where 93% of the world’s child deaths4 occurred in 2017. Gains in child survival varied substantially within the vast majority of countries from 2000 to 2017. Countries such as Vietnam, for example, showed more than fivefold variation in mortality rates across second administrative-level units. The inconsistency of successes, even at subnational levels, indicates how differences in health policy, financial resources, access to and use of health services, infrastructure, and economic development ultimately contribute to millions of lives cut short25,26,27. By providing detailed maps that show precisely where these deaths are estimated to have occurred, we provide an important evidence base for looking both to the past, for examples of success, and towards the future, in order to identify where precision public-health initiatives could save the most lives.

The epidemiological toll of child mortality should be considered both in terms of total deaths and as rates of mortality. Focusing only on mortality rates can effectively mask areas in which rates are comparatively low but child deaths are high owing to large population sizes. The number of deaths that occur in high-risk areas has declined, and most under-5 deaths in recent years have occurred in lower-risk areas. This ‘prevention paradox’28 could indicate that whole-population interventions could have a larger overall impact than targeting high-risk areas29. At the same time, strategies that target resources to those locations that have the highest number of child deaths risk leaving behind some of the world’s most marginalized communities: remote, more-sparsely populated places in which, relative to the number of children born each year, a large number of children die before their fifth birthday. Instead, by considering subnational measures of both counts and rates of deaths of children under 5, decision-makers can better tailor child health programs to align with local contexts, norms and needs. Rural communities with high rates but low counts may benefit from ‘last-mile’ initiatives to provide effective health services to populations who lack adequate access to care. By contrast, locations with low rates but high counts may require programs that focus on alleviating the cost of care, unsafe environmental exposures or health risks that are uniquely associated with urban slums30. The SDGs have pointed the global development agenda towards progress in child survival. Our analysis indicates that reaching the SDG 3.2 targets of 25 child deaths per 1,000 live births and 12 neonatal deaths per 1,000 live births will require only modest improvements or have already been achieved by some units; however, these targets are ambitious for other units in which child mortality remains high. It is worth noting that many countries contain areas that fit both of these profiles. For example, 11 countries had at least 1 unit that had already met SDG 3.2 with high certainty, and at least 1 unit that had not. Subnational estimates can empower countries to benchmark gains in child survival against their own subnational exemplars as well as advances that have been achieved by their peers. Through our counterfactual analysis we showed that even if all units had met the SDG 3.2 goal in 2017, there would still have been 2.4 million deaths of children under 5, indicating that ‘ending preventable child deaths’ is more complex than simply meeting a target threshold. Future research efforts must address the causes of child mortality in local areas and more precisely identify causes of child deaths that are amenable to intervention. To that end, new and innovative data-collection efforts, such as the ongoing Child Health and Mortality Prevention Surveillance network, offer promising prospects by applying high-validity, pathology-based methods alongside verbal autopsies to determine the cause of death31.

This study offers a unique platform to support the identification of local success stories that could be replicated elsewhere. In Rwanda, for example, the highest U5MR at the district level in 2017 was 60.2% (52.0–67.8%) lower than the lowest U5MR at the district level in 2000. Such gains have been partially credited to focused investments in the country’s poorest populations, expanding the Mutuelles de santé insurance program, and developing a strong workforce of community health workers who provide evidence-based treatment and health promotion32,33. Nepal and Cambodia are among the exemplars for considerably decreasing subnational inequalities in child survival since 2000. In an era when narrowing disparities within countries is as important as reducing national-level gaps, these results provide the evidence base to inform best practices and stimulate national conversations about related social determinants.

Neonatal mortality rates have also declined but failed to keep pace with reductions in mortality rates of older children, leading to a higher proportion of deaths of children under 5 occurring within the first four weeks of life: from 37.4% (37.1–37.7) in 2000 to 43.7% (43.1–44.3%) in 2017. This trend is probably related to the increase in scale of routine programs and improved infrastructure (for example, vaccination34, and water and sanitation35) and the introduction of effective interventions to target communicable diseases (for example, malaria control36 and prevention of mother-to-child transmission of HIV37). These interventions have tended to target amenable causes of mortality that are more common in older children under 5 rather than dominant causes of neonatal mortality, such as prematurity and congenital anomalies38. Notably, irrespective of income level or location, some causes of neonatal death (for example, chromosomal anomalies and severe preterm birth complications) remain difficult to prevent completely with current medical technologies. Ultimately, large gains in neonatal mortality will require serious investment in health system strengthening39. Affordable approaches to preventing the majority of neonatal deaths in LMICs exist and there are success stories with lessons learned to apply40,41,42,43,44, but decisions about which approaches to take must be based on the local epidemiological and health system context. In the absence of spatially detailed cause of death data, subnational neonatal mortality estimates can indicate dominant causes and thus serve as a useful proxy to guide prioritization of interventions45.

The accuracy and precision of our estimates were primarily determined by the timeliness, quantity and quality of available data. In Sri Lanka, for example, there were no available surveys, and the wide uncertainty intervals surrounding estimates reflect the dearth of available evidence in that country (Extended Data Figs. 3, 4). In certain areas, this decreased the confidence that we had in claiming that a specific subnational area met the SDG 3.2 target (Fig. 1c). This issue is most concerning in cases in which estimated mortality rates are high, thus helping to identify locations in which it would be most useful to focus future data-collection efforts. High mortality rates with large uncertainty intervals were estimated across much of Eastern and Central sub-Saharan Africa, and in Cambodia, Laos, Myanmar and Papua New Guinea (Extended Data Figs. 3, 4). Furthermore, ongoing conflict in countries such as Syria, Yemen and Iraq pose substantial challenges to collecting more contemporaneous data, and our estimates may not fully capture the effects of prolonged civil unrest or war46,47. Further methodological and data limitations are discussed in the Methods.

The accurate estimation of mortality is also a matter of equity; highly refined health surveillance is common in high-income countries, whereas in LMICs, in which rates of child mortality are the highest, surveillance that helps to guide investments in health towards the areas with the greatest need is less routine48. Ideally, all countries would have high-quality, continuous, and complete civil and vital registration systems that capture all of the births, deaths and causes of death at the appropriate geographical resolution49. In the meantime, analyses such as this serve to bridge the information gap that exists between low-mortality countries with strong information systems and countries that face a dual challenge of weaker information systems and higher disease burden.

By harnessing the unprecedented availability of geo-referenced data and developing robust statistical methods, we provide a high-resolution atlas of child death counts and rates since 2000, covering countries that account for 93% of child deaths. We bring attention to subnational geographical inequalities in the distribution, rates and absolute counts of child deaths by age. These high-resolution estimates can help decision-makers to structure policy and program implementation and facilitate pathways to end preventable child deaths50 by 2030.

Methods

Overview

We fitted a discrete hazards geostatistical model51,52 with correlated space–time–age errors and made predictions to generate joint estimates—with uncertainty—of the probability of death (the number of deaths per live births) and the number of deaths for children aged 0–28 days (neonates), children under 1 year old (infants) and children under 5 years old at the subnational level for 99 LMICs for each year from 2000 to 2017. The analytical process is summarized in the flowchart in Extended Data Fig. 6. We made estimates at a grid-cell resolution of approximately 5 × 5-km and then produced spatially aggregated estimates at the first (that is, states or provinces) and second (that is, districts or counties) administrative levels, as well as the country level.

Countries were selected for inclusion in this study based on their socio-demographic index (SDI) published in the Global Burden of Disease study (GBD)53. The SDI is a measure of development based on income per capita, educational attainment and fertility rates among women under 25 years old. We primarily aimed to include all countries in the middle, low–middle or low SDI quintiles, with several exceptions. Brazil and Mexico were excluded despite middle SDI status owing to the availability of high-quality vital registration data in these countries, which have served as the basis for existing subnational estimates of child mortality. Because this study did not incorporate vital registration data sources (see ‘Limitations’), Brazil and Mexico were not estimated directly; instead, state-level estimates from the GBD 2017 study were directly substituted in figures where appropriate4. Albania and Moldova were excluded despite middle SDI status owing to geographical discontinuity with other included countries and lack of available survey data. North Korea was excluded despite low–middle SDI status owing to geographical discontinuity and insufficient data. As countries with high–middle SDI status in 2017, China and Malaysia were excluded from this analysis. Libya was included despite high–middle SDI status to create better geographical continuity. Island nations with populations under 1 million were excluded because they typically lacked sufficient survey data or geographical continuity for a geospatial analytic approach to be advantageous over a national approach. Supplementary Figure 3.1 shows a map of the countries included in this study and Supplementary Table 3.1 lists the countries.

Data

We extracted individual records from 555 household sample survey and census sources. Records came in the form of either summary birth histories (SBHs) or complete birth histories (CBHs). All input data were subject to quality checks, which resulted in the exclusion of 82 surveys and censuses owing to quality concerns (see Supplementary Information section 3.2 for more details). Data on life and mortality experiences from CBH sources can be tabulated directly into discrete period and age bins, thus allowing for period-specific mortality estimations, known as the synthetic cohort method54,55,56. For SBH data, we used indirect estimation57 to estimate age-specific mortality probabilities and sample sizes and assign them to specific time periods. Complete details are available in Supplementary Information section 3.3.2.

In all cases, after pre-processing, each data point provided a number of deaths and a sample size for an age bin in a specific year and location. We referenced all data points to GPS coordinates (latitude and longitude) wherever possible. In cases in which GPS data were unavailable, we matched data points to the smallest possible areal unit (also referred to as ‘polygons’). All polygon data were spatially resampled into multiple GPS coordinates and weighted based on the population distribution following a previously described procedure5,22,23,58 and described in Supplementary Information section 3.4. Our combined global dataset contained approximately 15.9 million births and 1.1 million child deaths. A complete list of data sources is provided in Supplementary Table 8.1.

In addition to data on child mortality, we used a number of spatial data sources for this analysis. These included a suite of geospatial covariates, population estimates and administrative boundaries68. These sources and processing procedures are described in Supplementary Information section 4.

Spatial covariates

We extracted values from each of 10 geospatial covariates at each data point location. Geospatial covariates are spatial data represented at the 5 × 5-km grid-cell resolution. The covariates were travel time to the nearest city, educational attainment of maternal-aged women, the ratio of population of children under 5 to women of reproductive age (ages 15–49 years old), the mass per cubic meter of air of particles with a diameter less than 2.5 μm, total population, a binary indicator of urbanicity, intensity of lights at night, the proportion of children aged 12–23 months who had received the third dose of diphtheria–pertussis–tetanus vaccine, incidence rate of Plasmodium falciparum-associated malaria in children under 5 and prevalence of stunting in children under 5 (see Supplementary Information). All covariate values were centred on their means and scaled by their standard deviations. Covariates typically had global spatial coverage and values that vary by year. More details of the spatial covariates can be found in Supplementary Information section 4.

Analysis

Geostatistical model

To synthesize information across various sources, and to make consistent estimates across space and time, we fitted discrete hazards51,52 geostatistical models59 to our data. The models were discrete in the sense that ages were represented in seven mutually exclusive bins (0, 1–5, 6–11, 12–23, 24–35, 36–47 and 48–59 months), each with its own assumed constant mortality probability. The model explicitly accounted for variation across age bin, year and space through inclusion of both fixed and random effects. Indicator variables for each age bin were included to form a discrete baseline mortality hazard function, representing the risk of mortality in discrete bins from birth to 59 months of age with covariates set at their means. Baseline hazard functions were allowed to vary in space and time in response to changing covariate values, as well as in response to linear effect on year. To model this relationship, we estimated the effect of each covariate value on the risk of mortality. These estimated effects were then applied to the gridded surface of covariate values to make predictions across the entire study geography. We also included a Gaussian random effect across countries to account for larger-scale variations due to political or institutional effects, as well as a Gaussian random effect for each data source to account for source-specific biases. Finally, we included a Gaussian process random effect with a covariance matrix structured to account for remaining correlation across age, time and physical space. As such, estimates at a specific age, time or place benefitted from drawing predictive strength from data points nearby in all of these dimensions.

For each modelling region, we fitted one such discrete hazards model with a binomial data likelihood. All data were prepared such that we counted or estimated the number of children entering into (n) and dying within (Y) each period–age bin from each GPS-point location (s) in each survey (k) within each country (c).

The number of deaths for children in age band (a) in year (t) at location (s) was assumed to follow a binomial distribution:

$${Y}_{a,s,t} \sim {\rm{binomial}}\left({n}_{a,s,t},\;{P}_{a,s,t}\right)$$

where Pa,s,t is the probability of death in age bin a, conditional on survival to that age bin for a particular space–time location. Using a generalized linear regression modelling framework, a logit link function is used to relate P to a linear combination of effects:

$${\rm{logit}}\left({P}_{a,s,t}\right)={\beta }^{0}+\mathop{\sum }\limits_{a=2}^{7}{I}_{a}{\beta }_{a}^{1}+{\beta }^{2}{X}_{s,t}+{\beta }^{3}t+{\nu }_{c\left[s\right]}+{\nu }_{k\left[s\right]}+{Z}_{a,s,t}$$

The first term, β0, is an intercept, representing the mean for the first age band when all covariates are equal to zero, whereas \({\beta }_{a}^{1}\) are fixed effects for each age band, representing the mean overall hazard deviation for each age band from the intercept, when all other covariates are equal to zero. β2 are the effects of geospatial covariates (Xs,t), which we describe in detail in Supplementary Information section 4. β3 is an overall linear temporal effect to account for overall temporal trends within the region. All geospatial covariates were centred and scaled by subtracting their mean and dividing by their standard deviations. Each v term represents uncorrelated Gaussian random effects: \({\nu }_{c\left[s\right]} \sim {\rm{normal}}\left(0,{\sigma }_{c}^{2}\right)\) is a country-level random effect applied to all locations (s) within a country (c); \({\nu }_{k\left[s\right]} \sim {\rm{normal}}\left(0,{\sigma }_{k}^{2}\right)\) is a data source-level random effect for the survey (k) from which the data at location s were observed. Data source-level random effects were used to account for systematic variation or biases across data sources and were included in model fitting but not in prediction from fitted models. The term Za,s,t ~ Gaussian process(0, K) is a correlated random effect across age, space and time, and is modelled as a four-dimensional mean zero Gaussian process with covariance matrix K. This term accounts for structured residual correlation across these spatial–age–temporal dimensions that are not accounted for by any of the model’s other fixed or random effects. This structure was chosen, because the hazard probability for each age group is expected to vary in space and time, and such spatiotemporal correlations are likely to be similar across ages. K is constructed as a separable process across age, space and time \(\left(K={\Sigma }_{a}\otimes {\Sigma }_{t}\otimes {\Sigma }_{s}\right)\). The continuous spatial component is modelled with a stationary isotropic Matérn covariance function, and the age and temporal effects were each assumed to be discrete auto-regressive order 1. We provide further details on model fitting and specification in Supplementary Information section 5.1.

We assigned priors to all model parameters and performed maximum a posteriori inference using Template Model Builder60 software in R version 3.4. We fitted the model separately for each of 11 world regions (see Supplementary Fig. 3.1), owing to memory constraints and to allow model parameters to vary across epidemiologically distinct regions.

Post-estimation

Using the joint precision matrix and point estimates, we generated 1,000 draws from all model parameters using a multivariate-normal approximation. These model parameter draws were used to predict corresponding draws of mortality probabilities across all age groups for each grid cell in each year. In other words, for each age bin in each year we estimated 1,000 gridded surfaces of mortality probability estimates, each surface corresponding to one draw from the posterior parameter estimates61. All subsequent post-estimation procedures were carried out across draws to propagate model uncertainty. We used these estimated spatiotemporal gridded surfaces of age-specific mortality probabilities to produce various final resulting data products.

From the fitted model parameters, we produced posterior mortality probability estimates for each age group for each 5 × 5-km grid cell for each year from 2000 to 2017. We combined gridded age group estimates to obtain infant (under 1) and child (under 5) mortality estimates at each gridded location. Using a conversion from mortality probability to mortality rates, and using a gridded surface of population, we also estimated the number of deaths that occurred in each age group at each location in each year. For both mortality probabilities and counts, we multiplied out corresponding gridded estimates by a constant to ensure that at the national—and in two countries, the first administrative-level unit—aggregated estimates for each age group and year were calibrated such that they equalled estimates in the GBD study4. This calibration allowed us to take advantage of national data sources, such as vital registration, that could not be used in this study. We also aggregated grid-cell-level estimates to first and second administrative-level units using gridded population surface to weight estimates. These steps are described in Supplementary Information section 5.2.

Model validation

We used fivefold cross-validation to assess and compare model performance with respect to estimating local trends of age-specific mortality. Each fold was created by combining complete surveys into subsets of approximately 20% of data sources from the input data. Holding out entire surveys at a time served as a comparable approximation to the type of missingness in our data, essentially helping us check how well our model estimates of mortality probabilities compared to empirical estimates of mortality probability from an unobserved data source that did not inform the model.

For each posterior draw, we aggregated to administrative units. Using data aggregated to the administrative unit and aggregated estimate pairs, we calculated the difference between out-of-sample empirical data estimates and modelled estimates, and we report the following summary metrics: mean error, which serves as a measure of bias, the square root of mean errors, which serves as a measure of the total variation in the errors, the correlation and 95% coverage. At the second administrative-level unit for under-5 mortality, our out-of-sample 95% coverage was 93%, correlation was 0.78, mean absolute error was 0.015 and mean error was −0.0011. These results indicate a good overall fit, with minimal bias. This procedure and the full validation results are discussed in Supplementary Information section 5.3.

Limitations

This work should be assessed in full acknowledgement of several data and methodological limitations. We exclusively used CBH and SBH data from household survey and census data sources. Ideally, estimates of child mortality should incorporate all available data, including data from administrative vital registration systems. Vital registration systems are commonly present in many middle-income and all high-income countries. There are known data-quality issues with vital registration sources in many middle-income countries48,62 that add complications to their inclusion in our modelling procedure. For example, systems may not capture all deaths, and this level of ‘underreporting’ probably varies in space, time and age. In addition, underreporting is probably negatively correlated with mortality, and could contribute substantial bias to estimates. Statistical methods must be developed to jointly estimate—and adjust for—underreporting in vital registration data before such data can be used in geospatial models of child mortality. Promising work has begun in this domain in specific countries63, but further advancement will be necessary to improve estimates across a time series and across many countries at once.

We assume that SBH and CBH data were retrospectively representative in the locations in which they were collected. As such, we assume that survey respondents did not migrate. High-spatial-resolution migration estimates with which to adjust estimates do not yet exist, and many of the data sources that we use do not collect information on migration. We conducted a focused sensitivity analysis (Supplementary Information section 5.4.4) for migration in six countries, and found that although our results were generally robust, there was variation by country. Furthermore, despite providing high-quality retrospective data from representative samples of households, birth history data can suffer from certain non-sampling issues, such as survival/selection biases64 and misplacement of births65. We did not attempt to make corrections to data, and they were used as-is. Furthermore, retrospective birth history data will—by design—have a changing composition of maternal ages depending on the time since the survey. This was minimized by limiting retrospective trends to up to 17 years.

Although we collated a large geo-referenced database of survey data on child mortality, these data represented about 1% (1.1 million) of total deaths of children under 5 in study areas over the period. Where data do not exist or are not available in certain locations, mean estimates are informed from smoothing to nearby estimates and covariates. As such, there could be additional small-scale heterogeneity that is not picked up by our model. Wider uncertainty intervals in areas with no data account for these potential unknowns, and our 95% coverage estimates in out-of-sample predictive tests appear to be well-calibrated at the second administrative unit level. Furthermore, discrete localized mortality shock events could be missing in our analysis due to the lack of data and selection biases in surveys and censuses, and spatiotemporal smoothing. Fatal discontinuities are explicitly accounted for at the national or province level by calibration to GBD estimates. In all, 0.35% (0.4 million) of the 123 million deaths over this period were attributed to fatal discontinuities.

On the modelling side, we integrated point and areal data into a continuous model by constructing pseudo-points from areal data. Modelling approaches that integrate point and areal data as part of a joint model likelihood function are in development66 but are currently computationally infeasible at the large geographical scales at which we currently model. Furthermore, we divided our models into 11 regional fits (see Supplementary Fig. 3.1), as a full model that encompasses all 99 countries would be computationally infeasible due to memory constraints. Splitting up modelling in this way had the benefit of enabling parameters to vary across epidemiologically distinct world regions. A preferred model, however, would be fitted to all data simultaneously with parameters that are spatially variable.

The separable model used for age–space–time correlations is a common parsimonious assumption afforded in applying spatiotemporal geostatistical models due to efficient computation and inference; however, it yields the assumption of fully symmetric covariance. The symmetry implicit in the separable model dictates, for example, that (holding age constant for simplicity) the covariance between the observations at (location 1, time 1) and (location 2, time 2) is the same as the covariance between (location 1, time 2) and (location 2, time 1). Given our available data density in space–age–time, we believe that attempting to parameterize a more complex non-separable model would be challenging both computationally and inferentially, and it is not clear whether there would be much to benefit from the extra complications.

There are several limitations to address with respect to the use of covariates in the model. Most of the geospatial covariates that we used in the geostatistical model were themselves estimates produced from various geospatial models. Some of those estimated surfaces used covariates that were also included in our model in their estimation process. As such, we emphasize that our model is meant to be predictive, and that drawing inference from fitted coefficients across these highly correlated covariates is problematic and not recommended. Furthermore, we assumed no measurement error in the covariate values and assumed that the functional form between mortality and all covariates was linear in logit space. In certain locations, we used covariate values for prediction that were outside the observed range of the training data. As we explore in Supplementary Information section 5.4.2, however, these areas represent a relatively small proportion of the population.

Finally, we used a method for indirect estimation of SBHs that was recently developed and validated57. As such, indirect estimation was carried out as a pre-processing step before fitting the geostatistical model. We attempted to propagate various forms of uncertainty that could be introduced in this step, which resulted in halving the total effective sample size across all SBH data. In future, we aim to fully integrate such processing into the statistical model; such methods are in development67, but are not yet computationally feasible at scale.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.