INTRODUCTION

Forty percent of adults across the world are overweight or obese.1 Obesity-related conditions, including heart disease, type-2 diabetes, and many cancers, contribute to substantial morbidity and mortality.2 The standard approach to obesity management in primary care has been mainly individual-focused behavioral lifestyle programs, such as referrals to nutrition or commercial programs like Weight Watchers.3 Our lack of progress in slowing the rise of obesity is due in part to obesity having multiple complex causes, including the obesogenic environment and the food system.4 Population-level approaches to obesity, which can be evaluated in “natural experiment” studies, are necessary for addressing these complex causes.

Natural experiments are events, interventions, or policies that are not controlled by researchers, but can be studied to determine their impact.5 Examples include assessing the effects of needle exchange programs on HIV prevention6 and the impact of indoor smoking bans on asthma exacerbations.7 Because randomized, controlled trials are not always practical or feasible, there is growing interest in using natural experiment studies to understand the effect of population-level policies and programs on chronic diseases like obesity. Examples of obesity policies include the New York City law requiring chain restaurants to post calorie information on menus,8 the Danish fat tax (2011–2013) which taxed foods exceeding a certain saturated fat content,9, 10 and the ban in schools of sugar-sweetened beverages in several states.11 Examples of built-environment changes amenable to natural experiment studies include the opening of new supermarkets in areas with lower access to healthy foods12,13,14 and installation of new light rail lines to improve public transport infrastructure.15,16,17

The purpose of this systematic review is to identify programs, policies, and built-environment changes targeting adult obesity that were evaluated in natural experiment studies to evaluate their effectiveness in reducing obesity and related outcomes.

METHODS

This analysis was part of a larger systematic review entitled, “Methods for Evaluating Natural Experiments in Obesity: Systematic Evidence Review”18, 19 and expands upon one of the key questions addressed in the report, regarding the 17 natural experiment studies which studied adult obesity outcomes. For a description of detailed methods, see the full report.18

Data Sources and Searches

We searched PubMed, CINAHL, PsycINFO, and EconLit (1 January 2000 to 28 March 2018) to identify all English-language US and non-US studies of programs, policies, and built-environment changes targeting obesity prevention and control in people of all ages and in any setting (Appendix). We also conducted a gray literature search of websites of key organizations involved in obesity prevention and evaluations of policies to identify white papers and unpublished evaluations. We also searched the reference lists of included articles for missed or unpublished reports.18

Study Selection

Two reviewers independently screened abstracts and articles to identify natural experiment studies evaluating a program or policy aimed at combating adult obesity that reported body weight/BMI outcomes.18 Studies were defined as natural experiments based on the UK’s Medical Research Council guidelines, in which exposure to the event or intervention of interest has not been manipulated by a researcher.5 Disagreements between reviewers were discussed and, if they could not be resolved, were adjudicated by a third-party reviewer.

Data Extraction and Quality Assessment

Two reviewers serially extracted data on study, population and intervention characteristics, magnitude of effect size and p value for weight (BMI or weight change in adults), and physical activity and dietary outcomes, including intake of total daily calories, fruits, and vegetables, sugar-sweetened beverages, fiber, and fast food.

Two reviewers independently assessed the risk of bias for each study using the Effective Public Health Practice Project (EPHPP) tool to rate studies on 6 domains.18 Each study received a global risk of bias rating: low risk of bias if no domains were rated as weak, moderate risk of bias if one domain was rated as weak, or high risk of bias if two or more domains were rated as weak.

Data Synthesis and Analysis

For each study, we categorized the target of the program, policy, or built-environment change based on a classification in the 2012 National Academy of Medicine report, “Accelerating Progress in Obesity Prevention: Solving the Weight of the Nation20: (1) physical activity and physical and built environments (e.g., building a new light rail line), (2) food and beverage environment (e.g., opening of supermarkets), (3) messaging environment (e.g., calorie counts on menus). Studies with multiple primary intervention targets were categorized as (4) “multiple targets.” We described the overall effect of these programs, policies, or built-environment changes on each outcome of interest (weight, diet, and physical activity behaviors), compared the magnitude of effect sizes for mean BMI change and between-group difference in BMI using a forest plot, and summarized dietary and physical activity behaviors using an evidence map. We were unable to perform a meta-analysis due to study heterogeneity and insufficient number of articles with both the same outcome and intervention. We classified outcomes as favorable if a statistically significant difference was reported in the hypothesized direction, unfavorable if a statistically significant difference was reported in the opposite direction, and no difference if no statistically significant difference was reported.

Strength of Evidence for Weight and BMI Outcomes

We used the Agency for Healthcare Research and Quality (AHRQ) guide’s strength of evidence grading schema21. Two reviewers independently graded the evidence based on the intervention target assessing studies’ limitations, consistency, directness, precision, and potential reporting bias for the evidence on weight/BMI outcomes: “High” strength of evidence indicated that the evidence likely reflected the true effect; “moderate” strength indicated that further research could change the result; “low” strength indicated low confidence that the evidence reflects the true effect; “insufficient” indicated no confidence in the estimate of effect.

Role of the Funding Source

The NIH Office of Disease Prevention funded the larger systematic review through an interagency agreement with AHRQ. A working group convened by the NIH assisted in developing the scope of the review and its key questions. Neither organization had a role in study selection, quality assessment, or synthesis. The investigators are solely responsible for the content. The original review is registered with PROSPERO (#CRD42017055750).

RESULTS

Of the 158 natural experiment studies in the larger systematic review,18, 19 17 reported weight/BMI outcomes in adults (Fig. 1 PRISMA diagram). Table 1 displays the individual study and population characteristics, main results for weight/BMI outcomes, and risk of bias. Study durations varied from 1 to 20 years. The mean age of the study population ranged from 38 to 80+ years. One study was an outlier for study duration and mean age because it followed children to adulthood for 20 years.24 The mean baseline BMI was 17–30 kg/m2. Only 7 studies reported on race/ethnicity. Most natural experiment studies had a high risk of bias (n = 9) (Table 3), particularly in terms of handling withdrawals and dropouts, and study design.

Figure 1
figure 1

Evidence search and selection.

Table 1 Study and Population Characteristics, Main Results for Weight/BMI Outcomes, and Overall Rating for Quality of Study for Included Natural Experiment Studies (n = 17), by Intervention Target

Nine studies targeted the physical activity and built environment, including the building of new light rail systems or extensions,15, 16 free bus pass eligibility,25, 26 and participation in the Housing Choice Voucher Program in New York City.27 Five studies targeted food and beverage environments, including the building of grocery stores12, 13, 28 and the Los Angeles fast food ban, which restricted opening/expanding stand-alone fast-food restaurants.29 One study targeted an obesity-related messaging environment, the calorie labeling law in New York requiring chain restaurants to post calorie counts on menus.30 Two studies targeted multiple environments (Table 1), one focusing on healthy eating and physical activity programs in the workplace23 and the other examining the effect of the English national strategy to reduce health inequalities with 3 comparison countries.22

Effectiveness of Policies, Programs, and Built-Environment Changes on Obesity, Weight, or BMI

Table 1 shows weight/BMI outcomes. Figure 2a displays mean pre-post BMI change within each group for 4 studies, and Fig. 2b displays the between-group difference in BMI in 4 other studies. All 17 studies reported on BMI. Strength of evidence for the outcome of weight/BMI was low due to an overall high risk of bias (4 of 9 studies) and inconsistency in the direction of effect (Tables 2 and 3). Among the 9 studies focused on the physical activity/built environment, 4 showed weight/BMI reduction, 3 had inconsistent findings by subgroups (i.e., some subgroups showed favorable outcomes and others did not), and 2 showed no difference. Three of the 4 studies that showed weight/BMI reduction focused on transit use.15, 16, 25 For example, users of a new light rail transit system in Charlotte, NC, had a reduced BMI compared to non-users, though this study had a high risk of bias.16 The fourth study examined physical activity intervention clusters (e.g., community health education campaigns, individual health behavior change) finding a reduction in obesity in 3 of the 4 clusters.31 Three of 9 studies showed inconsistent results by subgroup26, 32, 33 (Table 1). Two studies targeting the physical activity and built environment, one examining different categories of compulsory school physical activity in Australia and the other evaluating the Housing Choice Voucher Program in New York City, showed no changes in weight/BMI, although both studies had a high risk of bias.24, 27

Figure 2
figure 2

Forest plot displaying mean pre/post BMI change within each group (a) and between groups (b).

Table 2 Summary of the Strength of Evidence for BMI/Weight Outcomes
Table 3 Risk of Bias for Each Study Assessed Using the EPHPP Tool

None of the 5 studies targeting the food and beverage environment showed a weight/BMI reduction (i.e., one study increased weight/BMI and 4 showed no difference) (Table 1, Fig. 2). Strength of evidence was low due to high risk of bias (3 of 5 studies) and inconsistency in the direction of effect (Tables 2 and 3). Rigdon et al. showed that participation in the federal Supplemental Nutrition Assistance Program had no effect on BMI vs. non-participation after 1 year, but also had a high risk of bias.34

One study by Restrepo et al. focused on the messaging environment.30 Calorie labeling was associated with a BMI reduction in 11 counties in New York that implemented a law. However, strength of evidence was insufficient because only one study was included (Table 2).30

Among the 2 studies targeting multiple interventions, one study by Bolton et al. compared changes in BMI in Australia after implementing community and workplace programs promoting healthy eating and physical activity23 (Table 1). The study showed no difference in BMI between the control and intervention communities after 2 years and was rated as having a high risk of bias due to lack of information on blinding and withdrawals and dropout rates23 (Table 3). The second study by Hu et al. examined whether changes in trends in health inequalities in England after implementation of its broad national program (e.g., family support policies, tax-reduction) were more favorable vs. other countries without such a program, and found inconsistent direction of effects (England: no difference; Finland: unfavorable trend; Italy: favorable trend).22 The strength of evidence was rated insufficient due to a high risk of bias (1 of 2 studies), indirectness of evidence (i.e., self-reported height and weight data), and inconsistency in the direction of effect (Tables 2 and 3).

Effectiveness of Policies, Programs, and Built-Environment Changes on Dietary Behaviors

Figure 3 displays a summary of results for dietary outcomes, stratified by intervention target. Each outcome symbol represents an individual study, and the main result for the outcome. Reported dietary behaviors include intake of fruit and vegetables (n = 6), sugar-sweetened beverages (n = 1), total daily calories (n = 2), and fast food (n = 1). Studies reporting on fruit and vegetable intake generally found a small (0.1–0.3 servings/day) increase, or no difference. One study by Dubowitz et al., rated as low risk of bias, investigated the impact of a new supermarket in Pittsburgh, PA, and showed no difference in fruit/vegetable intake between the supermarket and control neighborhoods (between-group difference, − 0.14 servings/day, p value not reported), but demonstrated a decrease in caloric intake (between-group difference, − 178 kcal/day, p < 0.005).12 The largest included study, by Restrepo et al., evaluated the calorie labeling law in New York and found no difference in fruit and vegetable intake comparing counties that implemented the law vs. did not.27 Only one study reported on sugar-sweetened beverage intake and found no difference.23

Figure 3
figure 3

Summary of results for secondary outcomes of dietary and physical activity behaviors, by intervention target. Each outcome symbol represents an individual study and their main result (favors, does not favor or no difference) for the particular outcome. Studies reporting inconsistent results are described in footnotes.*

Two studies reported on caloric intake.12, 32 One, rated as moderate risk of bias, showed an increase in caloric intake for women and men after implementation of a Chinese policy providing a subsidy for select home appliances in rural communities.32

Only one study reported on fast food intake and demonstrated no difference among residents living in public housing using a Voucher Program vs. unassisted housing units, but had a high risk of bias.27

Effectiveness of Policies, Programs, and Built Environment Changes on Physical Activity Behaviors

Eight of the 17 studies reported on physical activity behaviors. Figure 3 displays a summary of physical activity outcomes, stratified by intervention target. Of the 9 studies targeting the physical activity and built environment, 2 showed an increase in physical activity,15, 26 2 showed no difference,16, 24 and one large study found that physical leisure activity declined among women and did not change among men.32 One study, rated as high risk of bias, showed that after the building of a light rail extension in Salt Lake City, riders increased physical activity compared with non-riders.15 Another study, rated as moderate risk of bias, found that bus-pass holders in England engaged in greater moderate or vigorous physical activity compared with non-bus pass holders (OR, 1.43, 95% CI, 1.12–1.84).26 A large single study on the calorie labeling law in New York found no difference in exercise comparing counties that implemented the law with those that did not.30 A study focused on multiple environments found that community programs aimed at healthy eating and physical activity had no effect on physical activity compared to control communities.23

DISCUSSION

Because it is not always feasible to conduct controlled trials focused on adult obesity prevention and control, natural experiment studies offer a valuable opportunity to evaluate population-level programs, policies and built-environment changes. Of the 17 studies we identified, most evaluated programs or policies promoting physical activity or evaluating built-environment changes, followed by those which focused on efforts to improve the food and beverage environment. Overall, we found no evidence that policies promoting physical activity and healthy eating had beneficial effects on the weight of adults, and most studies performed had a high risk of bias. Few studies showed a reduction in weight or BMI, and those that did generally had small (≤ 0.5 kg/m2 BMI) effect sizes, which would not be considered a clinically important difference. While the studies evaluating population-based policies and programs were generally of short duration, when there was no difference in weight/BMI reported, we did not identify early effects on diet or physical activity that would suggest that a longer follow-up would yield more robust changes in weight. In fact, overall, policies and programs targeting obesity resulted in small changes in dietary intake and physical activity, if any. Our results demonstrate a need for more and better evidence from natural experiment studies conducted with rigorous research methods.

Despite disappointing findings to date, novel approaches like natural experiment studies are needed to effectively assess the impact of population-level health policies and programs on obesity. The NIH has made it a priority to evaluate existing programs and policies via natural experiment studies. Recently funded NIH grants have included an evaluation of the impact on physical activity of a new light rail line in Houston,35 the impact on consumer behavior of a New York City sugar-sweetened beverage policy limiting the sale of large drinks,36 and the impact of a New York City park redesign and renovation initiative on physical activity.37 These promising studies have added to the small but growing body of natural experiment studies examining adult obesity prevention and control through population-based solutions. A research database of effective programs, policies, and built-environment changes in obesity prevention and control does not currently exist, but would provide invaluable information to researchers, policy makers and funders.

We identified some limitations in natural experiment studies, including heterogeneity in the reporting of effect sizes. While all studies reported on weight outcomes using BMI, this data was analyzed and reported in different ways (e.g., odds of obesity, change in BMI). Physical activity was measured by different methods (e.g., accelerometers, questionnaires) using various measurement units (e.g., counts/minute, MET-hours/week) making it difficult to compare effect sizes between studies. Clinical trial data from intensive individual behavioral weight loss interventions often produce between-group differences of 5 kg.3 However, the expected effect size for the types of population-level programs, polices, or built-environment changes we reviewed is not clear, especially since these studies’ primary focus is on preventing weight gain, not weight loss. Also, it is unclear what an adequate sample size for non-experimental studies is and such an assessment must account for numerous factors like clustering, different time points, and correlation of outcomes over time.

Another limitation was that most studies were 1–2 years in length, and longer studies may be needed to fully capture the impact on slow-to-appear obesity-related outcomes like population-level weights. Finally, most studies had a high risk of bias, due to high rates of losses to follow-up and lower rigor of study design (e.g., one-time surveys or simple pre-post designs) and high risk of confounding.

We searched PubMed for other relevant English-language systematic reviews within the past 10 years reporting on obesity-related outcomes in natural experiment studies. In one systematic review, Mayne et al. identified 37 studies that were natural- or quasi-experiments.38 Only 3 of these studies reported on weight/BMI, and only 1 of the 3 showed a reduction in self-reported BMI (with use of a new light rail system).16, 38 In contrast, our review required all studies to report on BMI/weight outcomes, and we found a favorable reduction in less than half of the studies. In addition, Mayne et al. found that studies reporting positive impacts on physical activity or diet tended to have longer follow-up times than negative studies.38 Our results for diet and physical activity outcomes were not consistent, based on sample size or follow-up time. Another systematic review examined policy interventions aimed at improving population nutrition,39 and the included studies reported on outcomes not included in our review (e.g., change in calorie value of purchases, consumer knowledge).39 A third systematic review examined the associations between physical activity and the built environment and assessed outcomes not included in our review: street and pedestrian connectivity, neighborhood parks, and green space.40

Our review itself has several limitations. First, we excluded studies not reporting on adult BMI/weight outcomes, so we may have excluded studies showing a favorable effect on intermediate outcomes like dietary or physical activity behaviors which did not report weight or BMI. Second, the number of included studies was small, and BMI and weight data were reported heterogeneously across studies, making it difficult to summarize the data quantitatively. Third, publication bias may have led to underreporting of policies, programs, or built-environment changes that showed no difference. Fourth, we described outcome results based on statistical significance, which may not constitute clinically-significant effects. As mentioned earlier, it is not clear what are clinically significant effect sizes for the types of population-level programs, polices, and built-environment changes we evaluated. Fifth, we used the AHRQ guide to evaluate the strength of evidence for weight/BMI outcomes and the EPHPP tool to assess risk of bias, but recognize that there is no standard method for assessing strength of evidence and risk of bias in natural experiment studies. A strength of natural experiment studies is enhancing generalizability and external validity, but the EPHPP tool was designed to evaluate internal validity, not external validity.18

In conclusion, we identified few natural experiment studies reporting on the effectiveness of policies/programs/built-environment changes on adult obesity, and overall, these studies demonstrated inconsistent effects on weight/BMI, with low or insufficient strength of evidence and high risk of bias.

In primary care, obesity is commonly managed through individual behavioral interventions, but population-level approaches can also play an important role. While natural experiment studies allow us to evaluate these approaches, this type of study has clear limitations, including the lack of rigorous research methods used and standards for data reporting. More high-quality research, including natural experiment studies, is needed to inform the population-level effectiveness of obesity prevention and control initiatives. Better understanding of these effects has the potential to augment the efforts of primary care providers in combatting the continuing global obesity epidemic.