Introduction

Assisted reproductive technology (ART) comprises the infertility treatment procedures in which a woman’s eggs are removed from her body and embryos are cultured in vitro [1]. Numerous studies over many years have shown that even singleton ART pregnancies result in an increase in adverse outcomes when compared with unassisted pregnancies to fertile women [2,3,4]. However, the question remains as to whether these adverse outcomes are a result of the ART procedures or underlying medical conditions associated with infertility [5]. To study this, a more appropriate comparison group, such as pregnancies conceived with other fertility treatments or those to women with infertility but no treatment, must be used. We have previously defined a comparison group that we called the “subfertile” group with which we compared outcomes following ART treatment [6]. We used the term subfertile for this group because it was a heterogenous group comprising pregnancies conceived with non-ART treatments identified from fertility treatments indicated on birth certificates, prior infertility diagnosis recorded on the hospital discharge records, and/or a history of prior ART or other fertility treatment but without necessarily having a defined diagnosis of infertility. Using this comparison group, we have found that adverse outcomes with ART were more similar to adverse outcomes in the subfertile group than the fertile group [7,8,9]. Although the subfertile group has been a useful comparison group, it is limited by the fact that we have no evidence that all of the women included have a defined diagnosis of infertility and from the likelihood that some women with infertility were missed and instead included in the fertile group.

Most infertility diagnosis and treatment is performed in an outpatient setting. Our prior studies of ART-treated and subfertile women used a linked database compiled from birth certificates, fetal death records, and hospital discharge records in the Pregnancy to Early Life Longitudinal (PELL) data system. PELL does not include outpatient data. Ascertaining women with infertility would be optimized if we could use outpatient records to identify women with an infertility diagnosis. One way to identify outpatient information is by using medical insurance claims data [10]. In Massachusetts, the Center for Health Information and Analysis (CHIA) has been provided with broad authority to collect healthcare claims data and to develop the All Payers Claims Database (APCD) under Massachusetts healthcare reform law. APCD collects claims data from insured patients in Massachusetts which are used by researchers to analyze population-level healthcare utilization and to determine quality outcomes for costs and pricing. The information in this system contains International Classification of Diseases (ICD) codes designated during outpatient as well as inpatient encounters to provide a supporting diagnosis for the visit.

In this study, we used the APCD to identify deliveries to women with a provider-defined insurance claim for the diagnosis of infertility but no ART treatment for that delivery, understanding that there would be some overlap between this group and the previously identified subfertile group (as defined above). We compared this group, which we defined as “infertile,” to our previously defined heterogeneous subfertile group, to a fertile group, and to an ART-treated group. Our goal was to evaluate whether this APCD-defined infertile group was a more complete, representative, and accurate comparison group than the subfertile group for outcomes to ART-treated women and to determine whether outcomes in this group were substantially different from those previously reported in the subfertile group.

Materials and methods

Data sources

We used data from (1) the Society for Assisted Reproductive Technology Clinic Outcome Reporting System (SART CORS), (2) the Massachusetts-based Pregnancy to Early Life Longitudinal (PELL) data system, and (3) the Massachusetts All Payers Claims Database (APCD). The study had approval from the Massachusetts Department of Public Health (MDPH) and the Dartmouth-Hitchcock Health Institutional Review Board. A Memorandum of Understanding was executed among SART, MDPH, and the project principal investigators.

The SART CORS database contains ART data entered by the clinics and reported to the Centers for Disease Control and Prevention in compliance with the Fertility Clinic Success Rate and Certification Act of 1992 (Public Law 102–493). The database contains ART cycle-specific demographics, infertility diagnoses, ART treatment, pregnancy, and outcome data, and is maintained by Redshift Technologies, Inc., under contract to SART. Data are obtained from approximately 90% of ART clinics in the USA and all Massachusetts clinics are included in the database. SART CORS data are validated annually with random clinics having on-site visits for chart review based on an algorithm for clinic selection. During these visits, data reported by the clinic are compared with information recorded in patients’ charts. In 2017, most data fields selected for validation were found to have discrepancy rates of ≤ 5% [11].

The PELL data system links Massachusetts birth certificates and fetal death records to corresponding hospital utilization data for the delivery event for the mother and infant, and to non-delivery hospital utilization (hospital admissions, observational stays, and emergency room visits) for the mother and child over time. The data have been linked for 98% of births and fetal deaths for individual women and their children since 1998. PELL data are linked through randomly generated unique IDs for mothers and infants. MDPH and CHIA are the custodians of the PELL data which are housed at MDPH.

The APCD is a comprehensive claims database that houses insurance claims from public and private insurance payers providing insurance to Massachusetts residents and employees. The database includes claims for medical, pharmacy, dental, vision, behavioral health, and specialty services. We obtained claims data for all available company and employer-sponsored insurance claims linked to women who delivered between January 1, 2013, and December 31, 2017. We are unaware of any validation studies on APCD data.

Linkage of SART CORS to PELL

We developed the Massachusetts Outcome Study of Assisted Reproductive Technology (MOSART) database as previously defined [12] by linking the SART CORS and PELL data systems for all Massachusetts resident women delivering in Massachusetts hospitals for deliveries from 2004 to 2017. Linkage was performed using a deterministic five-phase linkage algorithm. Linkage was based on mother’s date of birth, her first name and last name, father/partner’s last name, baby’s date of birth, plurality, and infant sex. The linkage rate for 2004–2017 data was 91.5% overall and 94.9% for deliveries in which both mother’s zip code and clinic were located in Massachusetts.

Linkage of MOSART to APCD

MOSART data from 2013 through 2017 were linked to the APCD under an MOU among CHIA, MDPH, and the project PIs. Information on PELL variables of the women and children from the MOSART database was submitted to CHIA for linkage using the member eligibility (ME) file. Variables included mother’s date of birth, last and first names, and zip codes for the women’s linkage, and infant’s date of birth, last and first names, sex, and zip codes for the child’s linkage. Upon obtaining the ME identifiers, CHIA matched and then extracted the APCD non-MassHealth medical claim (MC) records for the linked mothers and children and sent these data back to the PELL. Overall, 98.7% of the MOSART mothers and almost 100% of the MOSART children in 2013–2017 were linked to the APCD ME file of which 81% of the mothers and 54% of the children had at least one APCD entry. We did not have approval from MassHealth (the Massachusetts Medicaid provider) to obtain APCD insurance claims data, and thus although we linked all women in MOSART to APCD, the MassHealth claims were not included in these data. The 10.3% of companies that did not enter data into APCD could also not be included.

Patients

The study sample included all deliveries for MOSART-APCD-linked women with no MassHealth records for October 1, 2013–December 31, 2017. Deliveries included those from October 1, 2013, rather than January of 2013 to allow us to have 9 months of APCD data in which to find an infertility code for that delivery in our dataset if one existed.

Outcome measures

We obtained information on birthweight and gestational age from birth certificates. Clinically determined gestational age was modified, when needed, by reported dates of last menstrual period. Gestational ages outside of the range of 17–44 weeks were set as missing. Neonatal death was obtained from linked birth certificate and infant death data.

Fertility groups

Deliveries were classified as ART-treated if the delivery was linked to an ART cycle in the SART CORS database. The subfertile group was defined as previously described [6] as having one or more of the following: (1) a marked checkbox for infertility treatment on the birth or death certificate, (2) an ICD9 or 10 code for infertility (ICD codes 628 and V230; ICD 10 O09.00-O09.03 and N97.0-N97.9) during a prior hospitalization, (3) prior delivery with either a checkbox for infertility treatment or linkage to SART CORS. A delivery was defined as infertile if the woman who delivered had an APCD outpatient or inpatient claim prior to that delivery with a provider-confirmed diagnosis of infertility (ICD codes as above). Women were classified as fertile if they did not fall into any of the other categories.

Covariates

The following covariates were obtained from birth and death certificates: maternal and paternal age, race/ethnicity and education, maternal BMI, prior gravidity and parity, and infant sex. Information from birth, certificates, death certificates, and hospital discharge records was used to define: chronic hypertension and diabetes, gestational diabetes, pregnancy hypertension/preeclampsia/eclampsia, pregnancy-associated bleeding, and placental problems (abruptio placenta, placenta previa, vasa previa, and placenta accreta), other delivery complications including cephalopelvic disproportion, breech/malpresentation, prolonged labor, dysfunctional labor, febrile, fetal distress, cord prolapse, rupture membrane premature, rupture memberane prolonged, and caesarian hysterectomy, and method of delivery. APCD was used to define infertility diagnosis and treatment and SART CORS data for diagnosis was used for comparison to APCD in the ART group. We identified the following diagnoses related to infertility in the time period before the index delivery using ICD 9 and 10 codes for endometriosis, uterine, polycystic ovarian syndrome (PCOS), other ovulatory, diminished ovarian reserve (DOR), inflammatory conditions of the peritoneum and reproductive tract, and unexplained infertility (Supplemental Table 1). These diagnoses were also determined for the ART group in SART CORS using the reason for ART (rfa) fields. Treatment codes (Supplemental Table 1) were identified in the timeframe between LMP or presumptive LMP and delivery.

Statistical analyses

Bivariate and multivariate generalizing estimating equations (GEE) with Poisson distribution and exchangeable correlation structure were used to account for multiple deliveries by the same women and to estimate relative risk ratios (RRs) and 95% confidence intervals (CIs). Models were adjusted for mother’s age (< 30, 31–34, 35–37, 38–40, > 40), race/ethnicity (Hispanic, NHW, NHB, NHA, NH-others, unknown), education (HS or < HS, some college, college, post college, unknown), chronic diabetes (yes, no), chronic hypertension (yes, no), parity (1, ≥ 2), gestational diabetes (yes, no), pregnancy hypertension including preeclampsia/eclampsia (yes, no), placental problems (yes, no), plurality (singleton, multiple), and infant gender (male, female). Analyses were performed in SAS software 14.3 (SAS Institute, Cary NC). In accordance with guidelines from MDPH, we suppressed any counts that were less than 11.

Results

Our study sample included 91,851 deliveries to 78,508 women of which 70,726 were designated as fertile, 4,763 as subfertile, 11,970 as infertile, and 7,689 as ART-treated (Fig. 1). Only deliveries to women who did not have MassHealth at any time during the study period were included: this resulted in elimination of 60.0% of the deliveries (Fig. 1). More fertile (63.2%) than subfertile (39.2%), infertile (48.0%), or ART-treated (34.9%) deliveries were among those that were omitted due to those women having had any MassHealth during the study period.

Fig. 1
figure 1

Study sample. Fertile deliveries are those not in any of the other groups; Subfertile deliveries are those to a woman who delivered had one or more of the following: a marked checkbox for infertility treatment on the birth or death certificate, an ICD9 or 10 code for infertility during a prior hospitalization, a prior delivery with either a checkbox for infertility treatment or linkage to SART CORS; infertile are deliveries to a woman with an APCD outpatient or inpatient claim prior to that delivery with a provider-confirmed diagnosis of infertility: ART deliveries were those linked to SART CORS

The infertile cohort, defined as it was with the inclusion of outpatient data, contained close to 3 times as many deliveries as the previously defined subfertile group. Of those in the two groups, 2,406 (50.5%) in the subfertile group had a checkbox for fertility treatment marked on the birth certificate for the index delivery and 1,845 (15.4%) of the infertile group had this checkbox checked (Supplemental Table 2). Of the two cohorts, 1,466 were exclusively subfertile, 8,673 were exclusively infertile, and only 3,297 were identified as being in both groups. In addition, not all women identified in the subfertile group from the birth certificate (3,861 women) were identified through APCD (2,716, or 70.3%, of these women were identified in the infertile group). Of women who were ART-treated, 93.8% were identified as having an ICD 9 or 10 code for infertility in APCD prior to that delivery.

Table 1 compares the demographic characteristics of the 4 fertility groups. ART-treated, infertile, and subfertile women were older, more often white non-Hispanic, more highly educated, and more likely to be insured by private insurance at the time of delivery, than fertile women. Their partners were more likely to be older, white non-Hispanic, and highly educated. Women in these groups were also more likely to suffer from chronic hypertension and diabetes. With regard to the subfertile and infertile groups, both were younger, and both received more post-secondary education than the ART-treated group. Both the subfertile and infertile groups contained slightly more women who were white non-Hispanic than did the ART group. Overall, the infertile and subfertile groups were similar to each other. The use of ICD 9 and 10 codes to identify the infertile group resulted in more deliveries in that group than previously identified in the subfertile group. We therefore compared the demographics of the fertile group that we would have used as the control group if only the previously defined subfertile group was identified, to the fertile group containing only those women not in the three other groups. Results, shown in Supplemental Table 3, demonstrate that our previous inclusion of these extra deliveries (now known to include some infertile women) in the fertile group made very little difference in the characteristics of that group.

Table 1 Demographic characteristics in the four fertility groups

We obtained information on infertility diagnoses from APCD by searching for any infertility code in the claims records in the time period prior to delivery. Table 2 shows the prevalence of these diagnoses in the various fertility groups. In the fertile group, only tubal disease (6.30%), PCOS (1.58%), and other ovulatory disorders (15.31%) were found at rates greater than 1% of the full sample. The percentage with no infertility diagnosis was approximately 73.4%. The proportion of women with all diagnoses was higher in the subfertile, infertile, and ART-treated groups than for the fertile group. Rates for the subfertile and infertile groups were similar to each other with some being slightly higher in the subfertile and some slightly higher in the infertile group, but rates for the ART-treated group were higher than either of the other groups for most diagnoses. Only the ovulatory diagnoses did not follow this pattern.

Table 2 Infertility diagnoses and treatment from APCD and SART CORS

We further compared the diagnoses found in APCD to those reported by ART clinics to SART CORS. Here the percentages were very different with most being at lower rates in SART CORS than identified in APCD. By contrast, DOR was found at a much higher rate in SART CORS (20.85%) than in claims reported to APCD (1.66%). We were unable to identify male factor in APCD because our analysis was done on records for the females and that code would be found under the male partner’s records. Furthermore, 919 (12.74%) of the ART cohort with infertility in APCD had a diagnosis of Other in SART CORS (data not shown). Table 2 also presents information on claims data for fertility treatment codes. Of the ART patients who we know all had ART treatment as defined by linkage to SART CORS, a treatment code could only be found in 62.27% of cases in APCD. Codes for treatment could only be identified in 8.82% and 7.03% of the subfertile and infertile groups respectively.

Pregnancy and delivery characteristics for the four fertility groups are shown in Table 3. Women with ART-treated deliveries had consistently higher rates, and fertile deliveries had lower rates, of all adverse pregnancy and delivery complications including hypertension, diabetes, placental problems, dysfunctional labor, and post-delivery hysterectomy. As with other characteristics, the subfertile and infertile groups oscillated between which of them had the higher rate of various obstetric conditions, but both had lower rates than ART-treated and higher rates than the fertile cohort. The same pattern persisted for infant characteristics of low birthweight and prematurity (Table 4).

Table 3 Delivery characteristics in the four fertility groups
Table 4 Infant characteristics in the four fertility groups

Table 5 presents risk ratios for low birthweight and preterm delivery for infants in the four fertility groups. The ART-treated as well as the subfertile and infertile groups had higher rates of prematurity (range of aRR was 1.15–1.17) and low birthweight (range of aRR 1.10–1.21) than the fertile group. When compared with the infertile group, the ART-treated group had a higher rate of preterm delivery (aRR 1.10) while the subfertile group did not differ. The ART-treated group did not differ from the subfertile group with regard to these parameters.

Table 5 Relative risk ratios for comparisons of four fertility groups

Discussion

This study identified a new cohort of deliveries to infertile women through analysis of claims data from the APCD. We compared this group to our previously identified heterogeneous subfertile group and to ART-treated and fertile groups. We found the infertile group defined by APCD to be considerably larger than the subfertile group but to have very similar characteristics. Some deliveries identified through one method were not identified by the other and vice versa.

It is tempting to use medical claims data for research. These data are extensive and have the potential to be tapped for a variety of research questions. Nevertheless, as previously reported [6, 13,14,15,16,17], the results in this paper suggest that caution is required when using these data. Claims data are only as good as the information entered by providers and the completeness of the insurance claims data file. In the case of the Massachusetts APCD, there were data missing from insurance providers who did not participate in the claims data upload. In MA, this was made more complicated by a 2016 lawsuit (https://www.chiamass.gov/assets/docs/p/apcd/regulatory-questions-for-apcds-related-to-scotus.pdf) that resulted in 10.3% of insurance providers opting to forego submitted data to the system from that point onwards (personal communication from CHIA). Although APCD uses a patient identifier to enable following each patient as she changes health insurance companies over time, patients could change in and out of those companies that do not enter data into the system. Thus, while we were able to obtain information on prior infertility for many patients, our numbers may still not be complete given that a small percentage of women may have had insurance with companies that did not enter data. Furthermore, the fact that 60% of women with deliveries had MassHealth, a Medicaid-based insurance option, during some point in the study period, resulted in our having no information on this group and thus our decision to remove these women from our study sample. This group with some MassHealth accounted for more fertile than subfertile, infertile, or ART-treated women likely because MassHealth does not cover infertility treatment and because of the demographics of women who seek fertility care, but this is still an omission. Regardless of these omissions, our study demonstrates that the infertile and subfertile groups both showed similar profiles for demographics and adverse pregnancy outcomes. Specifically, we observed that women in the infertile and subfertile groups had similar or lower risk of adverse pregnancy outcomes than those of ART-treated women but greater risk than the fertile group. This suggests that infertile and subfertile groups may be similar as far as a comparison group for studies of ART. As previously argued by us and others [5], this also suggests that underlying infertility is a factor contributing to the adverse outcomes seen following ART treatment.

The larger size of the infertile group suggests that many infertile deliveries were missed using the definitions by which we previously identified the subfertile group. Nevertheless, the similarities between the subfertile and infertile groups included demographic parameters, underlying health conditions, as well as pregnancy and delivery outcomes. In all cases, both the subfertile and the infertile groups had characteristics that were intermediate between those of the ART-treated and the fertile groups. We have acknowledged in prior publications that the subfertile comparison group likely did not contain all cases of infertility [7, 18, 19], which may lead to misclassification, and most likely previous results were attenuated due to this. We have also previously suggested that though the fertile group contained some deliveries to infertile women, those were likely subsumed within the much larger fertile cohort. The data presented in Supplemental Table 3 support these prior claims.

Only 3,297 deliveries overlapped and were contained in both the subfertile and infertile groups. The fact that these groups did not overlap more completely means that the subfertile group missed identifying a substantial proportion of deliveries for which a diagnosis of infertility had indeed been made. However, it also could mean that the subfertile group, being defined as it was in large part through the checkbox for fertility treatment on the birth certificate, included some individuals who had fertility treatment for reasons other than infertility such as being a single woman, a same-sex couple, or a couple undergoing fertility treatment for genetic conditions. This distinction cannot be determined from the birth certificates. The extent to which this is the case cannot be fully appreciated at this time; however, the fact that just under 13% of the ART cohort with infertility in APCD had an infertility diagnosis of “Other” in SART CORS suggests that we should not expect an infertility diagnosis in all these women. The designation of infertile also missed some deliveries found through the parameters of the subfertility definition. One reason for this is timeframe. Our APCD data only went back to January of 2013. By contrast, MOSART data used to identify subfertility included hospitalizations extending back to 1998. Cases of infertility identified in the subfertile group through hospital discharges before 2013 could thus have been missed when only APCD data were used. Furthermore, the APCD is based on insurance claims and is not a clinical database. The data in this system are dependent on how claims are processed and whether or not the woman had insurance provided by a participating medical insurance carrier. Although these data are valuable, they are not infallible or comprehensive, and as with other vital records and other non-medical databases, they are subject to limitations.

We have not previously been able to identify specific infertility-related diagnoses among the subfertile group and this is something that could be an advantage of using APCD. Analysis of infertility diagnosis and treatment from APCD yielded information on the reasons for infertility in all groups. Again, it was clear that the subfertile and infertile groups were similar and that the fertile group was less likely to have any infertility-related diagnosis. The ART-treated group had more women with various diagnoses than either the subfertile or infertile groups with a few notable exceptions on ovulation disorders potentially indicating that treatment was possible for these conditions without ART. When comparing to prior studies, the percentage of women identified as having endometriosis (< 4% in all but the ART group) was lower than prior 6–15% overall population estimates [20, 21], and the percentage of those with PCOS (< 5% of the population overall) also lower than the expected 6–10% in the general population [22]. Our estimates from APCD thus appear to have missed a proportion of the diagnosed cases. This will require further review and validation. The differences between the infertility diagnosis information obtained from APCD and that obtained from SART CORS is also of concern. In particular, the much higher rate of DOR identified in the SART CORS suggests that even when that diagnosis is given for an ART cycle, claims data do not necessarily include it. One reason for the differences may be that many clinics may enter only the primary diagnosis into SART CORS while review of ACPD would gather all the diagnoses from each clinic visit over time. Other reasons for this are at present unclear.

We also evaluated treatment codes in APCD to see the number of women on whom we could identify that fertility treatment was used for the index delivery. We found that these treatment codes, when found at all, were often identified outside of the timeframe that we anticipated. We had anticipated that most codes would be found within the timeframe in the month around oocyte retrieval or conception date. Instead, many were found closer to delivery date. Reasons for this are unknown. The fact that fewer women in the subfertile and infertile groups had claims for treatment than had an indication of fertility treatment on the birth certificate also suggests that either only a small proportion of them had intrauterine insemination or that many were treated with fertility drugs which were not part of our claims search. Although we have information on fertility medications in our APCD data, the claims information for this could not be temporally connected with a particular delivery since medication can be dispensed outside of a defined treatment window. This is another factor which makes insurance claims data particularly risky for use in studies of medical history and outcomes.

Risks of preterm delivery and low birthweight were elevated in our ART-treated, subfertile, and infertile groups when compared with the fertile group. This is consistent with prior results [3, 7]. The ART-treated group also had a higher adjusted rate of prematurity when compared with the infertile group but not the subfertile group (Table 5) perhaps reflecting the larger size and greater diversity of the infertile group compared with the subfertile. Importantly, the subfertile and infertile groups did not differ from each other in either prematurity or birthweight. The APCD-defined infertile group, therefore, was more similar to the subfertile group than to either the ART-treated or the fertile groups.

Our study has strengths and limitations. Among the strengths are the robust sample size in all groups and the availability of birth certificate and hospital data on underlying health and pregnancy complication information. In addition, we have the advantage of being able to have used two definitions of compromised fertility to approach the same questions on outcome. Limitations include reservations about the accuracy of APCD data and the fact that women with any MassHealth data had to be excluded. We also uncovered that the timing of claims in APCD in relationship to actual treatment dates was inconsistent with expectation and that this may be a limitation to future use of APCD claims data for research. A final limitation was that we used data from one state with a particularly comprehensive insurance claims file and thus our methodology and findings may not be generalizable or useful to investigators using databases from other states.

In summary, we defined a newly constituted infertile group based on information on infertility diagnosis in the APCD. This group was considerably larger than our previously defined subfertile group which was based on birth certificate data, on fertility treatment, and several other indicators of infertility. The advantage of this new infertile group is that all members of this group had a defined diagnosis of infertility. The disadvantage is that its use relies on claims data which must be considered to have a number of limitations as to accuracy and completeness. Both comparison groups (subfertile and infertile) had similar characteristics resulting in these groups being part way between those of the fertile and the ART-treated groups and both yielded similar, though not precisely the same, data on infant outcomes. In conclusion, we propose that each of these comparison groups, either separately or in combination, might have specific advantages for different study questions and that although both are better comparators than the fertile group, one is not considerably better or more unique than the other.