Introduction

Evidence is growing that gastrointestinal dysfunction is highly prevalent in children with ASD(McElhanon et al. 2014), is associated with problem behaviors (Maenner et al. 2012; Mannion et al. 2013; Mazefsky et al. 2014; Mannion and Leader 2016; Marler et al. 2017) and has potential neurobiological significance (Margolis et al. 2016; Margolis 2017). However, clinicians and researchers lack a brief parent-report screen to help them identify children with autism spectrum disorder (ASD) who likely have a gastrointestinal disorder (GID). To be useful, such a screen must take into account the fact that children with ASD, regardless of spoken language level, may not communicate or localize pain in typical ways due to their social communication and sensory processing impairments (Oberlander and Zeltzer 2014).

This report describes the development of a brief parent-report screen for common, often painful GIDs in children with ASD. The screen is derived from a longer parent-report questionnaire developed by pediatric gastroenterologists participating in the Autism Speaks-Autism Treatment Network (AS-ATN). The original ATN questionnaire was designed to assess signs and symptoms of three common GIDs—functional constipation, functional diarrhea and gastroesophageal reflux disease (GERD)—selected by gastroenterologists as common and impairing in children with ASD (2005). The original ATN questionnaire was novel in three respects: it predominantly assessed GI signs (observable manifestations) rather than symptoms (subjective experiences); it included manifestations of GERD; and it asked about specific subtle recurring motor acts (e.g. arching back, stiffening or squeezing the buttocks, applying pressure to the abdomen, or gagging during meals) observed by pediatric gastroenterologists in children with ASD presenting with GID (Buie, Campbell et al. 2010). To date, several studies have reported significant correlations between selected items in the ATN questionnaire and problematic behaviors in children with ASD (Mannion et al. 2013; Mazurek et al. 2013; Mazefsky et al. 2014; Mannion and Leader 2016).

This report is the first to describe the derivation from items in the original ATN questionnaire of a screening measure for functional constipation, functional diarrhea, and GERD in children with ASD. It is based upon a two-stage study conducted at two of the ATN registry sites. In the first stage, caretakers completed the ATN questionnaire. In the second stage, pediatric gastroenterologists, unaware of parental questionnaire responses, evaluated each child for the clinical diagnosis of functional constipation, functional diarrhea and/or GERD. Using data from both stages, this study identifies a smaller set of maximally predictive items as a screen for these three common GIDs in children with ASD.

Methods

This study was approved by the Institutional Review Boards of The Massachusetts General Hospital and Columbia University Medical Center. Legal guardians signed consent forms for all participants. Children and adolescents signed assent forms when appropriate.

Recruitment

Potential participants were all ATN Registry enrollees coming sequentially to the two ATN sites for clinical care during defined time intervals (at MGH between 09/05/2008 and 07/16/2010 and at CUMC between 05/08/09 and 04/26/2010). All ATN enrollees met the criteria shown in E-Table 1. The study was described to parents as having two stages, the first being completion of a questionnaire by the parent and the second being a free consultation with a pediatric gastroenterologist who would be unaware of parental responses on the questionnaire.

Of all the AS-ATN Registry enrollees coming to the two sites for clinical care during these defined time intervals, parents of 131/229 (57.2%) consented to their child’s participation and completed the pediatric gastroenterology consultation. (see E-Table 2).

Procedures and Forms

Stage 1: Parent Questionnaire

The 35-item Gastrointestinal Symptom Inventory (ATN GI Symptom Inventory 2005),was the basis for screen development. An additional 42 follow-up items were asked only if the parent endorsed certain of the core 35 items; these 42 items were not included in screen development. From the core 35 items, nine items were removed: one of these was gender-specific (menstruation in girls), two asked about a GI condition (e.g. constipation), not about signs or symptoms, four asked parents to make comparisons about bowel movements involving “as usual” without being anchored to specific frequencies, one item asked about weight gain or loss and one item asked the parent to make a global assessment of their confidence level in assessing the child’s pain. The remaining 26 core items assessing GI signs and symptoms were included in screen development. For these 26 items, the time frame for 23 items was “in the last 3 months”, for one “in the last year” for two, “ever”. Two of the 26 items assessed bowel movement frequency and consistency, respectively, and offered five parental response options, including “unsure”. The remaining 24 items offered three parental response options: “yes” “no” or “unsure”. The “unsure option” was included to allow for parental uncertainty based on the child’s self-report or lack of opportunity to observe, as might occur when a child toilets independently. Importantly, only 5 of the 26 items referenced subjective experiences (symptoms); these included pain (three items), nausea (one item), and bloating. The other 21 items are observable manifestations (signs) of GI problems.

Stage 2: Expert Clinical Diagnosis

For each child seen in consultation, the gastroenterologist (TB, KGM or HW) recorded his/her impression about the presence or absence of functional constipation, functional diarrhea, and/or GERD using published criteria adapted by the authors for children with ASD (Table 1). Recommendations for follow-up (for any reason) were recorded and shared with the family. After completion of any follow-up and a chart review, the consulting gastroenterologist reached a final impression as to the presence/absence in each participant of functional constipation, functional diarrhea, and/or GERD. Additionally, each GID was categorized as (1) previously recognized but unresolved despite ongoing treatment at the time of the study consultation visit or (2) newly recognized as a result of the study consultation.

Table 1 Study definitional criteria for gastrointestinal disorders (GID)
Examination of Non-participants, Site and Examiner Differences

Those ATN enrollees whose parents declined participation in the study did not differ from those whose parents agreed with respect to gender, age, race, ethnicity or spoken language level. Parents of nonparticipants, however, were less likely to have a college degree (see E-Table 2). As there were no significant site differences among participants in child and family characteristics (with the exception of more Hispanic families at the CU site) and no site or study doctor differences in GID rates (data not shown), the combined sample was used for all analyses. Two of the 131 children who participated in the GI consultation were missing data on the parent questionnaire. Thus, these two cases are only included in the description of the diagnosed GI conditions and are not included in the development of the screen.

Analyses

Initially, the frequency distribution of parent responses to the 26 core items were compared. The rate of “unsure” response for GI signs are compared to rates of “unsure” responses to GI symptoms. Rates of “unsure” were also compared across age, gender and whether the child was verbal. The remaining analyses were conducted in four stages:

  1. 1.

    Exploratory factor analysis to identify separate dimensions within the core 26 items in the ATN questionnaire. At this stage, we also compared the means of the scales based on these dimensions for children with and without each of the three clinician-diagnosed GIDs using t-tests;

  2. 2.

    Estimation of two-parameter Item Response Theory (IRT) models (Embretson and Reise 2000) to identify items that were highly discriminatory for these dimensions. The extent to which these items can predict GIDs was subsequently assessed;

  3. 3.

    Subjecting the remaining scale items to ROC analyses to determine the optimal cut point for identifying additional cases of GID; and

  4. 4.

    Combining the results of stages 2 and 3 to develop screening algorithms for each of the three GIDs, as well as a screen for having any one of the three GIDs. The sensitivity, specificity, and positive predictive value (PPV) of each algorithm is reported.

The rationale for beginning with factor analysis was to obtain sets of internally consistent items to subject to ROC analysis. In ROC analysis, however, items are examined as equivalent contributors to the dimension. Thus the IRT analysis was conducted to find items that may be particularly useful for identifying cases of GID.

Missing data on individual items in the parent questionnaire was rare. Scales were constructed under the assumption that missing data on a particular item represented a “no” response.

Results

Sample Description

Table 2 shows the demographic characteristics of the sample; 43.4% were non-verbal. GIDs were highly prevalent. Of the 131 children in the sample, 76 had at least one GID diagnosed at the study consultation, most commonly functional constipation (35.1%), followed by GERD (29.8%) and functional diarrhea (5.3%). Twenty-seven children (20.6%), had ≥ 1 newly recognized GID. Children with at least one GID did not differ significantly from those without a GID on any demographic variables or in terms of spoken language level (E-Table 3). There also were no significant differences between newly and previously recognized GID cases on these variables (E-Table 4).

Table 2 Demographic and developmental characteristics of the study sample and prevalence of expert diagnoses of GID (N = 131)

Item Response Frequency Distribution

As seen in Table 3, readily observable GI signs including motoric acts, had very low rates of “unsure” responses, while parents were much more likely to be unsure about their child’s subjective experiences (symptoms) such as pain, nausea, or bloating. The rates of “unsure” responses, however, did not vary by age, gender or level of spoken language (E-Table 5). Subsequent analyses collapse “no” and “unsure” responses.

Table 3 ATN GI Signs and Symptoms Inventory-26 question frequency distribution

Factor Analysis

An exploratory factor analysis was conducted on 26 items from the original ATN questionnaire. This resulted in four distinct factors making substantial independent contribution to explained variance in the items (based on the scree plot): (a) “Retentive”, (b) “Expulsive”, (c) “Gassy”, and (d) “Motoric” (factor loadings are provided in E-Table 6). Table 4 displays the items belonging to each factor. Items with similar sized loadings on multiple factors were included in both scales created to represent those dimensions. Items in Table 4 that are not in bold are those removed because their inclusion reduced the internal reliability (Cronbach’s alpha) of the summed scale.

Table 4 Items comprising the four dimensions from the ATN-GI Signs and Symptoms Inventory-26

Relation Between Scales and GID

Table 5 compares means scores on the four scales across children with and without a GID. Functional constipation is strongly associated with the Retentive scale and has weaker associations with the Motoric and Gassy scales. GERD is strongly associated with the Motoric scale; it is less strongly and non-significantly related to the Gassy and Expulsive scales. The diagnosis of functional diarrhea has a strong relation with the Expulsive scale, but falls short of statistical significance, likely because of its low prevalence. Functional diarrhea is unrelated to any of the other scales. Finally, the Retentive scale and the Motoric scale are both significantly associated with the likelihood of having any one of the three GID.

Table 5 Differences in ATN-GI Signs and Symptoms Inventory-26 Scale Scores across GID

Item Analysis

Two-parameter IRT models were estimated for each of the four scales. Figure 1 displays the item characteristic curves for all of the items in each of the scales as well as the item discriminations. The highly discriminatory items in each scale were selected for special attention in predicting the three clinical GID. These items were not selected based on a fixed discrimination value but rather discrimination (steepness of the item characteristic curve) relative to other items in the scale.

Fig. 1
figure 1

Item characteristic curves and discriminations for the four ATN GI symptoms inventory dimensions

Functional Constipation

Four items on the Retentive scale are distinguished by their steep item characteristic curves relative to the remaining items (Fig. 1a): having two or fewer bowel movements (BMs) per week (last 3 months), having pain with BMs (last 3 months), missing activities because of problems with BMs, and missing activities due to pain or discomfort. These four-items were included in a stepwise logistic regression predicting functional constipation (see E-Table 7a). Although the associations of functional constipation with the Gassy and Motoric scales are modest, the highly discriminatory items from these scales were also included in the logistic regression. Even with very loose significance criteria (p < .10 for increased F), only two items made an independent contribution to predicting functional constipation: 2 or fewer BMs per week, and missing activities because of problems with BMs. When endorsement of either of these two items is classified as a positive screen, the result is high specificity (89.7%) and excellent PPV (65.2%) (Table 6). The sensitivity, however, is very poor (34.1%). Use of this screen would thus miss nearly two-thirds of subjects with functional constipation.

Table 6 Sensitivity, specificity, and positive predictive value (PPV) of screens for functional constipation, functional diarrhea, and GERD

In an effort to enhance sensitivity, the remaining items in the Retentive scale were subjected to an ROC analysis. This analysis excluded cases already screened positive in the previous step. The results (see E-Table 7b) indicated that the optimal cut-point for identifying additional cases of functional constipation is one or more items. This suggests that the optimal overall screen is endorsement of one or more of any of the six items in the Retentive scale. When this is used as the definition for a positive screen for functional constipation, the sensitivity rises to 75.6%, with a specificity of 61.0% and a PPV of 51.5% (Table 6).

Functional Diarrhea

Functional diarrhea is associated only with the Expulsive scale. The three items in this scale distinguished by high discrimination values (missed activities due to vomiting, spit up 2 or more times in a day, and experienced wretching), however, are not usually considered signs of functional diarrhea. These orally expulsive items, when co-occurring with the other items in the scale suggest a transient infection. An ROC analysis was conducted using the remaining items from the Expulsive scale. These included nausea, need to rush to the bathroom for a BM, Black/tarry BM, missed activities due to problems with BMs, BMs soft/mushy/watery, and a motor act (tilted head to the side and arched back). The results of this analysis (E-Table 8) reveal an optimal cut-point of one or more of these items for identifying cases of functional diarrhea. A screen based on this definition has a sensitivity of 83.3% (5 of 6 cases). The specificity is 51.2% and the PPV is 7.8% (Table 6).

GERD

A diagnosis of GERD is significantly associated with scores on the Motoric scale. It also has non-significant, weak associations with the Expulsive and Gassy scales. The items with comparatively high discrimination values in any one of these three scales were included in a stepwise logistic regression predicting the presence of a diagnosis of GERD (see E-Table 9a for the items included in the regression). After the most strongly associated item (choke, gag, cough or wet sounds during or after swallowing or with meals) entered the model, none of the other items had a significant independent relationship with GERD. Interestingly, this single item alone captured 40.5% of all cases of GERD (specificity 87.6%; PPV 57.7%) (Table 6). Nonetheless, we considered the rest of the Motoric, Gassy, and Expulsive scales as a means of improving sensitivity. As the Motoric scale is related to functional constipation as well as to GERD, we removed the items explicitly referencing bowel movements and included the remaining Motoric items (tilted head to side and arched back, pushed abdomen, refused foods eaten in the past, stopping all activities 2 + hours due to pain) in an ROC analysis (E-Table 9b), with those already screened positive using the choke/gag item excluded. The ROC indicates an optimal cut-point of two or more of the remaining Motoric scale items.

The remaining screen negatives were then subjected to further ROC analyses involving the Gassy and Expulsive scales. The former was uninformative—the area under the curve (AUC) was < 0.5. Since the Expulsive scale is related to functional diarrhea as well as to GERD, we conducted a ROC on the oral expulsive items that remain after those in the screen for functional diarrhea were removed. Thus, the ROC was conducted using nausea, spitting up two or more times a day, retching, and missing activities due to vomiting. The optimal cut-point is the endorsement of one or more of these items (E-Table 9b). When the criteria from the ROC analyses for both the motoric items (two or more positive) and the oral expulsive items (one or more positive) were included as a path to a positive screen, the sensitivity rises to 73.0% with a specificity of 64.0% and a PPV or 45.8 (see Table 6).

Any GID

If a positive screen for the presence of any of the three GIDs is defined as one or more positive screens for the individual GID, the result is a highly sensitive screen (83.6%) with a specificity of 43.4% and a PPV of 67.0% (see Table 6). Because a number of items are shared in common by the screens for the individual GIDs, only 17 of the original 26 items are required to screen for the presence of any GID. This instrument is presented here as the ATN-GI Signs and Symptoms Inventory-17 (ATN-GISSI-17) (see Appendix).

It should also be noted that, because of co-occurrence across the three conditions, the overall screen is actually more sensitive for individual conditions than are the component screens (functional constipation sensitivity = 84.4%; functional diarrhea sensitivity = 100% and GERD sensitivity = 86.5%;).

Additional analyses were conducted to see if a yet more parsimonious screen could be used to identify the likely presence of any GID, so that the specific diagnosis could later be determined by a gastroenterologist. A stepwise regression of all highly discriminatory items assessed for the individual screens identified three items that were independently predictive of the presence of a GID—two or fewer BMs a week, spitting up more than twice per day, and missed activities due to excessive gas (see e-Table 10). A screen defined by endorsement of any one of these three items results in a sensitivity of 38.6%, a specificity of 86.5%, and a PPV of 79.4% (Table 6). While the high PPV means that any individual child referred for a GI consultation on the basis of these screening items has a high probability of having a GID, this screen would miss far too many GIDs to be useful. Thus, the screening algorithm based on the ATN-GISSI-17 is clearly recommended assuming that it can be validated in other ASD samples.

Discussion

This paper aimed to develop a brief parent-report screen for identifying children with ASD likely to benefit from further GI evaluation. This 17-item screen (the AS-ATN GI Signs and Symptoms Inventory-17), derived from a longer questionnaire, targets three common and often painful GIDs—functional constipation, functional diarrhea, and GERD. The screens for the individual disorders are modestly sensitive and specific. The combination of the three screens, however, is quite sensitive as a screen for any of the three GID and is not overly burdensome in terms of false positives (Sens = 83.6%; Spec = 43.4%; PPV = 67.0%).

As expected, given the social communication and sensory processing impairments of children with ASD, rates of parental “unsure” responses were higher for the few items assessing symptoms (subjective experiences of GIDs) than for signs (observable manifestations of GIDs) and did not differ by age or spoken language level. Items involving motoric acts had the lowest rates of parental “unsure” responses and proved particularly useful for identifying cases of GERD, The item “choke, gag, cough, or sound wet during or after swallowing or with meals” identified 40.5% of cases with GERD with high specificity (89.7%), while other motor acts (“tilting his/her head to the side and arched back, push abdomen with his/her hands or your hand, push his/her abdomen against or lean forward against furniture”, and “refused foods that would eat in the past”) were helpful in increasing the sensitivity of the screen for GERD. The movements of “tilting head/arching back” are referred to by gastroenterologists as “Sandifer’s syndrome”, which is considered a clinical sign for GERD(Vandenplas, Rudolph et al. 2009). Hopefully, this study will widen awareness of these GERD-associated motoric acts among autism providers who might otherwise limit their differential and further evaluation to possible tics(Simonoff et al. 2008) or seizures(Bauman 2010; Jeste and Tuchman 2015; Hung 2016). The importance of GERD in children with ASD is underscored by fact that, in this clinical sample, GERD was nearly as common (29.8%) as functional constipation (35.9%) and, of the two, had a higher rate (43.6% vs 28.2%) of being newly recognized as a result of the GI consultation.

This study differs from the only other study to compare parent report and clinician diagnoses of GIDs in children with ASD with respect to the parent report instrument (Gorrindo et al. 2012). That study examined how a general population measure of GI symptoms (the 71-item Questionnaire on Pediatric Gastrointestinal Symptoms (QPGS)-Rome III) (Whitehead et al. 2006; Lewis et al. 2016) aligned with gastroenterologists’ diagnoses of GIDs in a clinical sample of children with ASD (Gorrindo et al. 2012). Consistent with this study, they found that parent report identified the presence of any GID better than specific GIDs and also found a high prevalence of gastroenterologist-diagnosed GERD (20%). The AS-ATN-GISSI-17 has the advantage of being a shorter measure that is more suitable for screening and does not include many items about GI symptoms that are typically difficult to ascertain in ASD. The AS-ATN-GISSI-17 also contains items assessing motoric acts that are not included in the QPGS-Rome III.

The AS-ATN-GISSI-17 is now ready for a test of its validity in an independent clinical sample of children with ASD. If found to be valid, it could be used in research to narrow the pool of children with ASD in whom an actual diagnosis (by a gastroenterologist) needs to be made before inclusion in a study, reducing the time and cost of recruiting and characterizing samples of children with ASD defined by GID status. It could also be useful in clinical care. It is noteworthy that slightly over one-fifth of this clinical sample were found to have one or more GID(s) that had not been previously identified. This is particularly concerning because the stereotype that children with ASD have “a high threshold for pain” is not supported by evidence(Oberlander and Zeltzer 2014). Routine screening for common, often painful GIDs in children with ASD is feasible with this brief parent-report instrument. Moreover, while problematic behaviors (e.g., irritability, aggression, self-injury, sleep problems) may occur with a wide range of medical conditions, a screen such as this could allow autism providers to quickly and systematically consider GIDs as a possibility and refer appropriately.

Limitations of the Study

A number of limitations need to be noted. The first pertains to the limited number of cases. This is particularly true for functional diarrhea; the small number of cases may be due to the age of onset criterion (< 3 years), which is based on evidence that chronic diarrhea onset after age 3 is usually organic in origin (Guiraldes and Roessler 2013). The small number of cases, together with the overlap between functional diarrhea and GERD in the data, resulted in the identification of a factor (expulsivity) that included items representative of both functional diarrhea and GERD. For example, the description of “tilted his/her head and arched back”, usually indicative of Sandifer’s syndrome and therefore associated with GERD, was also associated with functional diarrhea. Despite this limitation, we chose to include diarrhea as one of our screened conditions because several studies have reported a high incidence of diarrhea in children with ASD (Kang et al. 2014; McElhanon et al. 2014; Alabaf et al. 2018; Holingue et al. 2018). Further, we wanted to emphasize the possibility that previous reports of high rates of parent-reported signs and symptoms suggestive of diarrhea (McElhanon et al. 2014) are in fact encopresis- that is, frequent, loose stools around a large, hard stool mass due to functional constipation, which often co-occurs with GERD (Baran et al. 2017). A GI consultation may be necessary to distinguish between encopresis and functional diarrhea (Colombo et al. 2015). Further research employing different and larger samples will determine whether this low prevalence for functional diarrhea is unique to our sample.

A possible second limitation arises from the nature of the item analysis. The IRT analysis identified items that might be especially effective in distinguishing the presence or absence of GI conditions, rather than treating all items the same as in a ROC analysis. A different sample might have produced a different set of items for the screen, though it is likely that the resulting sensitivity and specificity would be similar. It also should be acknowledged that there was occasional missing data on the parent questionnaire and this that this may have resulted in negative screen results that would have been positive under under the circumstance of complete data.

Finally, it should be noted that the analyses here were undertaken for the purpose of developing a screen. This screen will have to be tested on other ASD samples before full confidence can be placed in its validity. It should be noted, however, that the screen for “any GID” returned values for sensitivity, specificity, and PPV that were virtually identical for the two study sites. The two sites were quite different in terms of ethnic composition, providing reason to hope that the screen can be effectively applied across clinical care settings.

Adding new questions not included in the original ATN questionnaire might identify some of the 16.4% of children with a GID condition who were missed. Such increases in sensitivity, however, are likely to come at the expense of specificity and positive predictive value (PPV). Full sensitivity can be achieved by eschewing use of a screen and referring all children with ASD for a GI consultation. In order for a screen for GID to be worth the time and effort involved in its administration in a clinical or research setting, it must achieve a substantial reduction in false positives over the alternative of universal referral to a pediatric gastroenterologist. While it is true that a third of the children with positive screens in this sample did not ultimately have one of the three GIDs, this rate of over-referral seems an acceptable burden in light of the fact that it correctly identified over 80% of the sample who had at least one GID.

It is important to note that since the completion of this study, new Rome criteria (Rome IV) have been developed that include the diagnoses of functional diarrhea and functional constipation. Although use of the Rome IV criteria would have been optimal for use in this study, questions asked of all study participants were based on the wording noted in the Rome III criteria. The differences between Rome III and Rome IV, however, are minor with regards to the diagnostic criteria for functional constipation and functional diarrhea (Simren et al. 2017). The major differences between Rome III and Rome IV include the addition of four new diagnoses that we did not screen for. These diagnoses, however, would have been highly unlikely to be diagnosed in our pediatric population (opioid-induced constipation, narcotic bowel syndrome/opioid-induced GI hyperalgesia and cannabinoid hyperemesis syndrome) or very challenging to diagnose given a requirement of the ability to verbalize and localize pain (reflux hypersensitivity) (Simren et al. 2017; Yamasaki and Fass 2017). The screen results would thus not likely differ (Simren et al. 2017).

Common and often painful childhood GIDs may go unrecognized in children with ASD due to their communication and sensory processing impairments. The brief parent-report screen developed here is now ready for a validation study in an independent clinical sample of children with ASD. Once validated, it is hoped that its use will improve clinical care and facilitate research.