FormalPara Key Points

There are currently limited guidelines and recommendations on how data should be collected to assess the safety of medicines in pregnancy.

In this article, we provide an expert consensus opinion on the core data elements that should be collected when performing studies of medication safety in pregnancy.

The aim of these recommendations is to standardise data collection processes and improve the speed at which high-quality evidence-based statements can be provided about the safety of medication use in pregnancy.

1 Introduction

Medication is commonly used during pregnancy [1], with some studies indicating that more than 80% of women use at least one prescribed medication during pregnancy [2, 3]. When including over-the-counter medicines, nearly all pregnant women use at least one medicine [4, 5]. Due to a combination of ethical restrictions and practical complexities, the risks and benefits of medication use in pregnancy are usually established through observational/non-interventional studies in the post-marketing setting. Unfortunately, there is currently no standardised or systematic approach to post-marketing pregnancy pharmacovigilance (PregPV) data collection. As such, evidence on the safety of a particular medication is typically slow to accumulate [6]. Pregnant women and their healthcare providers are therefore often faced with insufficient [7, 8] or conflicting [9] information when making clinical decisions about medication use.

In 2019, the Innovative Medicines Initiative (IMI) launched the ConcePTION project in acknowledgement of the societal obligation to rapidly reduce the uncertainties around the safety of medication use in pregnancy and breastfeeding [10]. The project consortium exists as a public–private partnership of research academics, healthcare professionals and pharmaceutical industry specialists with wide-ranging expertise in fields relating to pharmacovigilance/drug safety, epidemiology, medical statistics, risk–benefit communication, clinical teratology, reproductive toxicology, and childhood health and development. The aim of the project is to establish a trusted research ecosystem that can efficiently and systematically generate and disseminate reliable evidence-based information about the risks and benefits of medication use during pregnancy and breastfeeding.

Primary source data collection methods, where information about medication exposure and pregnancy outcome is collected directly from pregnant women and/or their healthcare providers, have been utilised for decades to provide data for observational PregPV research. Whilst numerous longstanding national and international primary source PregPV datasets exist, these worldwide data are geographically fragmented and exist in various non-standardised formats, in some cases, sub-optimally designed for this bespoke purpose. This results in variable data quality and impedes the ability to rapidly combine raw data and/or to assimilate the evidence generated from different studies to decrease the time taken to provide reliable conclusions about the safety of medicinal products. The ConcePTION project aims to develop a series of operational recommendations for primary source PregPV datasets to optimise and standardise data collection techniques, analysis, and reporting.

This article outlines the generation of a reference framework of core data elements (CDEs) recommended for primary source data collection in PregPV. These recommendations aim to standardise data collection and data generation. This standardisation will increase data harmonisation and evidence synthesis capabilities, thereby improving the ability to provide high-quality evidence-based statements about the risks and benefits of medication use in pregnancy in a shorter timeframe.

2 Methods

2.1 Expertise of the CDE Recommendations Development Group

The CDE recommendations expert working group (authors MS, AM, YG, JLR, RLB, MBD, and LMY) was jointly led by ConcePTION partners from industry pharmacovigilance departments (Novo Nordisk and Novartis), a national clinical teratology counselling service in the UK (UK Teratology Information Service), a clinical-academic neurodevelopmental PregPV research group (Manchester University), and clinical genetics units (Northern Genetics Service, Newcastle Hospitals NHS Foundation Trust, and University of KwaZulu-Natal). Each of the leading members were highly experienced in running PregPV monitoring systems in industry, academic, or clinical settings, reporting findings from these studies in the scientific literature or in regulatory interactions, and critically appraising the scientific integrity of such studies [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32].

2.2 Structure of the CDE Recommendations

The CDE recommendations have been arranged into a series of tables consisting of related data elements. These include tables describing CDEs that are required for administrative/database functions, and those describing maternal demographic and obstetric, foetal outcome, and infant/childhood outcome domains, for both the short- (up to infant age of 1 year) and longer-term (for ongoing assessment of child health and neurodevelopment beyond 1 year).

Each of the data elements are described with a clinical definition and a recommended data format. Where relevant, elements that could be coded using recognised medical coding systems such as Medical Dictionary for Regulatory Activities (MedDRA) or International Classification of Disease (ICD, version 9 or 10) have been highlighted. Individual data elements are also categorised as either essential or highly recommended for the generation of PregPV data relating to short-term outcomes (such as pregnancy, foetal, neonatal, infant outcomes up to 1 year of age) or longer-term outcomes (childhood health and neurodevelopment). Data elements identified as essential indicate that a meaningful analysis of the data cannot be conducted without their inclusion. Data elements not identified as essential remain highly recommended, with their inclusion allowing for a more optimal analysis of PregPV data, and ultimately improving the quality of PregPV research outputs. It is noted that different research objectives may alter which CDEs are, and are not, considered essential. An expected source, relating to whether the data would be collected directly from the reporter or derived from other data elements, or potentially both, is also provided for each of the elements. The expected purpose in PregPV studies is also provided for each of the data elements.

2.3 Development of CDE Recommendations

The CDE recommendations were developed in a five-step process:

(1) Scoping review The primary step in the creation of the CDE recommendations involved a critical review and appraisal of existing PregPV guidance documentation and established PregPV data collection systems. High-level guidelines for the collection of PregPV data provided by regulatory authorities including the US Food and Drug Administration (FDA) [33] and the European Medicines Agency (EMA) [34] were located and considered. Organisations with a leading role in the collection, interpretation, and reporting of PregPV data were identified through expert working group discussion (all authors). These included international networks of teratology information services (European Network of Teratology Information Services [ENTIS]) [35] and specialists (Organization of Teratology Information Specialists [OTIS]) [36], an international disease-specific pregnancy registry (International Registry of Antiepileptic Drugs and Pregnancy [EURAP]) [37], national post-marketing surveillance systems (Vaccines and Medications in Pregnancy Surveillance System [VAMPSS]) [38], an international enhanced pregPV monitoring system (PRegnancy outcomes Intensive Monitoring [PRIM]) [39], a single medicinal product pregnancy registry (Gilenya®/fingolimod) operated by a pharmaceutical company (Novartis) [40], and direct-to-patient online data collection systems (Best Use of Medicines in Pregnancy [BUMPs] [41], the Dutch Pregnancy Drug Register [42], and IMI PROTECT [43]). Details of the guidelines, data dictionaries, and/or data collection materials were either collected from publicly available sources or requested from each of these organisations to provide an overview of the most commonly collected data elements in PregPV research. For the longer-term childhood health and neurodevelopment outcomes, discussions were held with researchers operating the VAMPSS in the US (as this already collects longer-term outcomes [38]), and a Delphi method expert consensus project was completed with experts in childhood neurodevelopment to identify CDEs specific to these outcomes [44]. It was not considered practical to include a list of all possible childhood health and neurodevelopmental outcomes/conditions within the framework. As such, only the most common conditions, based on the clinical experience of the expert working group, were included.

(2) Identification of the CDEs Using the information collected in the scoping review, and with expert review of key statements and statistics needed to assess the safety of medication use in pregnancy, CDEs were identified and tabulated into grouped categories. Several CDE items not identified in the scoping review but considered valuable for the purposes of PregPV by the expert working group were added to the recommendations. Key statements and statistics that pregPV studies should aim to provide include descriptions of the prevalence of (1) pregnancy outcomes, including terminations, miscarriages, intrauterine foetal deaths/stillbirths, and live births; (2) foetal outcomes, including malformation and growth; (3) neonatal/infant outcomes, such as neonatal complications, infant health, and survival; and (4) longer-term childhood outcomes, such as health and neurodevelopment.

(3) Definition of the CDEs Standard clinical or medical definitions were used preferentially. Some CDE items either lacked a standardised definition or conflicting definitions were available. Where several clinical or medical definitions were available, consensus over the most appropriate definition was achieved through discussion, both internally in the CDE recommendations working group and externally with the wider expert partners in the ConcePTION project. Preference was given to simple and practical definitions that would not impede the collection or storage of raw data.

(4) Data format suggestions Utilising the experience of the working group, general recommendations were developed for the format of each CDE item to ensure all relevant information is collected for each item. Preferences were given to metric measurements.

(5) Essential status of the CDEs All data elements included in the framework are highly recommended, and every effort should be made to collect each data element when performing PregPV research. Some data elements are considered essential in that (1) follow-up of exposed pregnancies would not be possible without collection of the data; (2) appropriate statistics on outcomes would not be possible to calculate without collection of the data; or (3) the data element is very strongly correlated with adverse outcome risk and therefore should be considered in the data analysis phase. Consensus over whether CDE items were essential for PregPV purposes was achieved through extensive discussion. Where a consensus could not be achieved, data elements were marked as very important to collect.

2.4 Evaluation of CDE Recommendations

The CDE recommendations were internally evaluated in two phases. The primary phase involved structured group discussions and the CDE expert working group circulating the preliminary recommendations to the full working group (all authors) for critical comments and suggestions for improvement. This process was repeated until consensus was achieved. The secondary phase involved attaining consensus with other working groups of the wider ConcePTION project, including academic and industry partners with expertise in utilising population-based administrative datasets for PregPV and pregnancy pharmacoepidemiological research. Consensus was achieved through workshops and group discussions. The expertise of those involved in the second evaluation was wide ranging, including clinical and academic experts from fields including obstetrics, human genetics, neurology, paediatrics, psychology, pharmacoepidemiology, pharmacovigilance, and drug safety (see acknowledgements). A supplementary review of the finalised CDE recommendations was undertaken to ensure alignment with STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) reporting checklists.

2.5 Inclusive Language Statement

As it is important to be inclusive yet unambiguous and concise when providing recommendations for maternity, perinatal, and postnatal health research [45], we use widely accepted terms throughout the CDE documentation and this article to refer to those who are pregnant and give birth.

3 Results

The CDE recommendations are provided in 14 tables of related data elements. The first peer-reviewed version of these tables is provided in the supplementary appendix (see the electronic supplementary material). Live versions of the recommendations are also published in full on the ENTIS website (http://www.entis-org.eu/cde). These recommendations will be updated when required to ensure the data elements and their definitions remain aligned with current PregPV research practices and needs. Below we provide a brief overview for each of the CDE tables and highlight important content/considerations.

3.1 Database Administrative Details

Online Table 1 describes data elements that would enable administrative tasks and/or database functionality when using primary source data collection methods in PregPV research.

PregPV research often focusses on prospectively reported pregnancies to limit the impact of inclusion or sampling biases. The prospective and retrospective definitions in Online Table 1 resulted in considerable discussion and deliberation because there are multiple, sometimes contradictory, definitions of a prospective pregnancy report in both the literature and guidance documents from regulatory authorities. Published definitions can also be ambiguous and are potentially open to differing interpretations. For example, the FDA define a prospective pregnancy report as being described prior to any information about the pregnancy outcome being available’. This may be interpreted to include all pregnancies reported prior to the end of pregnancy if ‘outcome’ refers to birth, miscarriage, or termination, but is ambiguous in the content of congenital anomalies as a defined outcome. Through discussion within the CDE working group, the FDA definition was interpreted as any pregnancy reported prior to prenatal screening having been undertaken (to detect congenital anomaly). However, the EMA definition describes a prospective report as being described ‘prior to the knowledge of the pregnancy outcome or prior to the detection of a congenital anomaly at prenatal examination’. This was interpreted slightly differently as any pregnancy reported prior to any prenatal detection of congenital anomaly (and thus could include pregnancies where prenatal screening had been undertaken, but where no abnormalities were detected). The CDE recommendations are also applicable to retrospectively collected data.

For this reason, the definition of prospective status provided in the CDE recommendations is the most basic; pregnancies reported to the data collection system whilst the patient is still pregnant. This definition was considered preferential as it was thought to be the least restrictive upon study sample sizes. Furthermore, this definition can be applied to studies investigating postnatal childhood health and neurodevelopmental outcomes, recognising that the outcomes remain unknown for the entire pregnancy. However, the recommendations also highlight that alternative definitions for a prospective pregnancy report exist, and that these may be used where preferred or required, and may vary by the outcomes being studied. These definitions allow for the exclusion of pregnancies that have been recruited either (1) after any prenatal screening has been performed [33] or (2) after the prenatal detection of congenital anomalies or any other adverse pregnancy/foetal outcome [34]. The CDE recommendations therefore also describe additional data elements that are of value to collect, in order to apply other commonly proposed definitions of a prospective pregnancy case.

3.2 Maternal/Paternal Details

Online Table 2 provides recommendations for important co-variable risk factors relating to the pregnant woman and the father of the conceptus. These include sociodemographic factors such as the pregnant woman’s age, income, educational attainments, ethnicity, body mass index (BMI), and details about smoking, alcohol, and recreational drug use, among others, all of which were judged to potentially impact on pregnancy/foetal/neonatal outcomes. Whilst many of these co-variables apply to a range of such outcomes, certain co-variables may pertain only to longer-term child health or neurodevelopmental outcomes. Many data elements included in Online Table 2 are considered important co-variable risk factors, and collection of these data is highly recommended. For some of the data elements, such as maternal ethnicity, there is a limited understanding of the influence these variables have on pregnancy outcomes. Inclusion of these variables in the CDE recommendations may result in more research to understand these interactions. Additionally, some of the data elements, such as maternal IQ generated from a clinically validated tool, may not be readily available. Where feasible, research tools and infrastructure may need to be developed to allow for the collection of information that is considered of high importance for the specific outcome under investigation. This may be particularly relevant to child health and neurodevelopmental outcomes.

3.3 Pregnancy Details

The CDE items listed in this Online Table 3 are valuable for data collection administrative duties, such as identifying when pregnancy outcome follow-up attempts should be performed, for deriving alternative prospective/retrospective definitions, and certain pregnancy outcomes. Specifically, the table contains the expected date of delivery (EDD), which is a vital data item for PregPV purposes. This data item provides an anchoring point for several essential CDE items, including the trimester of pregnancy at set time points and exposure time periods (when reported by date only), and is informative for pregnancy outcomes such as gestational age at delivery (prematurity) and expected birth weight (small or large for gestational age). In the CDE, the pregnancy monitoring period is deemed to start from the date of last menstrual period (LMP) where this is available. In most pregnancies, this is approximately 2 weeks prior to the approximate date of conception. For PregPV studies utilising primary source data collection, the LMP is a useful date of reference for data collection purposes. It also allows estimation of gestational age at set time points early in gestation, before a more accurate dating of the pregnancy has been established through ultrasound dating scans. Alternative methods to calculate gestation are described for instances where the LMP is not available or is unreliable (for example, due to an irregular menstrual cycle).

3.4 Maternal Medical History Details

Online Table 4 contains a single CDE item that provides recommendations about collecting details of maternal medical conditions that existed prior to pregnancy. Of note, the details collected here can reflect the indication(s) for medication use (details of which are collected in Online Table 6), both for any medication/medicinal product under investigation and for any concomitant medications. In many cases, it would be valuable to collect information about the severity of the condition(s), particularly whether these conditions are active at the time of conception. Recommendations regarding essential variables for specific maternal diseases were beyond the scope of the ConcePTION project. Investigators are therefore strongly encouraged to liaise with maternal disease specialists to develop disease-specific variables that may correlate with adverse pregnancy outcome or impaired child health and/or neurodevelopment. Maternal illness may be an important co-variable risk factor for the analysis (at either the individual case level or in statistical covariate analysis) of certain adverse pregnancy/foetal/neonatal/infant outcomes. Details regarding maternal mental health conditions are considered particularly valuable for studies investigating longer-term child health and neurodevelopmental outcomes.

3.5 Family Medical History and Obstetric History Details

The items described in Online Table 5 relate to relevant family medical history, including those of genetic conditions, congenital anomalies, neurodevelopmental impairments, and obstetric history, including the number of prior pregnancies and their outcomes. Such information may be important as co-variable risk factor(s) for the analysis (at either the individual case level or in statistical covariate analysis) of certain adverse pregnancy/foetal/neonatal/infant outcomes.

3.6 Pregnancy Medication Exposure Details

Collection of high-quality data around the timing, dose, route of administration, and indication for medication use in pregnancy is crucial when performing PregPV research. Each of the CDE items listed in Online Table 6 should be repeated for each instance of medication use during pregnancy. An instance of medication use is a time period when a medication was used at regular intervals. Should an individual medication be stopped and then restarted later in pregnancy, it is recommended that this is logged as a separate instance of use.

Several of the CDE items in this table required significant discussion and debate, particularly for the definitions provided for the timing of peri-LMP exposures and the trimesters of pregnancy. The first trimester has various definitions in the literature, both in terms of when the period begins and also when it ends. Some PregPV studies state that the first trimester period starts at the date of LMP, whereas others define the first trimester period begins from conception (approximately 2 weeks post-LMP). This minor discrepancy can impact on exposure classification and hence adverse outcome risk estimates.

Similarly, there is no standard definition for the period before the pregnancy began during which effects of exposure to medications may persist into pregnancy (here referred to as the peri-LMP period). Researchers may need to adapt the definition of the peri-LMP period (see Online Table 6) at the analysis stage depending on the medication exposure under study, taking consideration of the pharmacokinetics and pharmacodynamics of a particular medication. For example, medications with long half-lives may be discontinued prior to conception, but clinically relevant concentrations of the medication may remain in the systemic circulation into the sensitive period of organogenesis. Alternatively, medications that are used to treat acute conditions may only be used for short time periods and be discontinued between the LMP date and the date of conception. Provided these medications have a short half-life, pre-conception/peri-LMP exposures probably have limited relevance with regard to malformation risks in the developing foetus.

3.7 Maternal Illnesses and Obstetric Complications Details

Online Table 7 describes items that relate to medical conditions arising during pregnancy and are important co-variable risk factors for assessing aetiological relationships between gestational medication use and adverse pregnancy/foetal/infant outcomes. Some of these items may themselves also be important outcome variables for certain medications, for example, the occurrence of gestational diabetes in a study of glucose-lowering medications. The aim for these items is to collect information on any new maternal illnesses that develop during the pregnancy, including conditions related to pregnancy such as gestational diabetes, pre-eclampsia, or nausea and vomiting in pregnancy/hyperemesis gravidarum and non-pregnancy specific conditions. As shown by the recommended data format, it is strongly encouraged that data are collected on the gestational age when conditions arise. Collection of this detail will allow researchers to analyse the impact of maternal complications/illnesses at aetiologically relevant time points in the pregnancy.

3.8 Pregnancy Outcome Details

Collection of high-quality data about pregnancy outcome is vital. Online Table 8 describes the recommended data elements relating to pregnancy outcomes. The data items spontaneous abortion (SA) (miscarriage) and stillbirth (intrauterine foetal death) required extensive debate. Definitions vary in the literature for the gestational age when a foetal demise is termed a miscarriage or an intrauterine death. This likely stems from international variation in definitions about when a foetus reaches a stage (based on gestational age alone or in combination with estimated foetal weight) where preterm birth could result in survival. Definitions in the literature range from 20 to 28 weeks gestational age (post-LMP) as the upper limit of when a foetal demise would be considered an SA/miscarriage [46,47,48]. Some definitions require the foetus to have reached a weight of at least 500 g before the demise is defined as an intrauterine foetal death [49]. After extensive discussion within the CDE recommendations working group, and externally with expert members of the ConcePTION project, the definition for SA/miscarriage was determined to be a foetal demise before the start of the 22nd week of pregnancy (≤ 153 days post LMP), with any foetal demise after this point being defined as a stillbirth (intrauterine foetal death). As stated previously, collection of gestational age at foetal demise is strongly encouraged as it allows outcome definition standardisation within the dataset, as well as the flexibility to undertake sensitivity analyses using different definitions if required. Data from datasets that have used varying definitions for SA/miscarriage and stillbirth could also be standardised for combination in common data models.

3.9 Delivery Details

Online Table 9 describes the recommendations for information regarding delivery details. These data elements may provide supplemental data of relevance for the analysis of certain adverse pregnancy outcome events, such as perinatal death or neonatal complications. It is recommended that the CDE items listed in this table are collected for each delivery event in the reported pregnancy. On rare occasions in multiple-foetus pregnancies, there may be more than one delivery event at different times and delivery methods may vary for each foetus/infant.

3.10 Live/Stillborn Birth Outcome Details

The CDE items listed in Online Table 10 should be collected for each live or stillborn infant.

3.11 Live Born Neonatal/Infant Outcome Details

The CDE items listed in Online Table 11 should be collected for each live born infant. The data item ‘Product/disease-specific outcomes’ included in this table is deliberately non-restrictive to allow researchers to customise their data collection efforts to gather information on relevant postnatal outcomes, defined by the medication and/or the primary outcome under investigation. For example, researchers investigating immunosuppressive medication use in pregnancy may wish to expand on the CDE recommendations to collect information about postnatal immune function, such as the number of infections experienced over a set period of time. Details relating to neonatal/infant death are also included in this table.

3.12 Malformations Details

The CDE items listed in Online Table 12 should be collected for each infant or foetus from the reported pregnancies. These details may not be available for foetuses that have been terminated due to non-medical reasons and are unlikely to be available for pregnancies that resulted in an SA/miscarriage. Where foetuses/infants are identified as having congenital anomalies, the CDE recommendations advise that full details of each anomaly are collected. The CDE recommendations currently endorse the use of EUROCAT guidelines to categorise each anomaly present in the foetus/infant. The CDE also makes provision for the recording of the classification of the case (as opposed to classification of individual anomalies). This section should only be completed following expert committee review, preferably whilst blinded to exposure status. Currently proposed categories include genetic/cytogenetic anomalies, major malformation, minor malformation, or malformations not otherwise stated (NOS). More details about classification are provided in the table. It is recommended that each congenital malformation event, and the resulting classification of the infant or foetus, should be judged by experienced adjudicators (qualified paediatrician, clinical geneticist, teratologist, paediatric neurologist, nephrologist, toxicologist, or clinical pharmacologist).

3.13 Longer-Term Child Health and Neurodevelopmental Outcome Details

Many PregPV systems are not structured to investigate longer-term child health or neurodevelopmental outcomes. Recently, experts within the field have recommended the inclusion of child neurodevelopment outcomes more centrally, to reduce the latency between a medication’s approval and the development of conclusive evidence regarding neurodevelopmental risk [22].

Child health and neurodevelopmental outcomes are numerous, can vary over time, and can be investigated to different levels of sensitivity. Longitudinal follow-up is recommended due to the prolonged period of development. It is recognised that a single study or initiative may not be able to investigate all outcomes to a high level, but it is recommended that as comprehensive a range as possible be included in all initiatives.

The data items included in Online Table 13 relate to childhood health outcomes, whilst Online Table 14 describes the data items recommended for assessing neurodevelopmental outcomes. The recommended child health questions have been devised from a list of common child health difficulties [50] and are considered sufficient to provide a high-level overview of childhood health. Child health is very complex, and numerous conditions may present. For the purposes of the CDE recommendations, it was not considered practical to include all of these possible conditions as essential elements. If researchers are aware of any specific conditions related to either an exposure or outcome of interest, these should also be included specifically. There may also be benefit in including detail on condition severity.

Neurodevelopmental outcomes represent the development and functioning of the central nervous system and are a set of independent, but interlinked, skills that evolve over time. Given the variety of PregPV data collection initiatives that may wish to adopt the CDE recommendations, only high-level neurodevelopmental domains have been included in Online Table 14. Measurement of these functions will vary by study design, investigator expertise, and the age of the children at assessment. Expert consultation should be undertaken to design the appropriate investigations.

It is recognised that not all child longer-term outcomes will be covered by the data items in these tables; however, the listed elements can be complimented by more detailed and in-depth studies investigating specific developmental attainments and health endpoints/outcomes.

4 Discussion

This article presents the IMI ConcePTION CDE recommendations for primary source PregPV data collection. The list comprises 98 data elements, including 73 identified as essential for the analysis of either immediate or long-term pregnancy outcomes following in utero medication exposure. The CDE framework described here compliments that developed within ConcePTION for PregPV studies using electronic healthcare record or administrative datasets [51].

Accurately establishing and communicating the risks and benefits of perinatal medication use is essential for supporting optimal prescribing practices, and for ensuring that maternal medical conditions are adequately treated during pregnancy with the least risk for the foetus and mother [52, 53]. Given the ethical, scientific, legal, and regulatory complexities surrounding medication use in pregnancy, which have historically resulted in the exclusion of pregnant women from clinical randomised controlled trials, very limited data are generally available prior to marketing authorisation [54]. Such data must be collected opportunistically using observational non-interventional studies in the post-marketing phase. However, several factors, including medicinal product labelling information, which usually recommends against use in pregnant women during the early years of market authorisation, and inconsistencies in how PregPV data are collected and collated limit the speed at which these data accumulate [6]. Additionally, when data are provided from observational/non-interventional studies, the lack of standardisation, both in terms of data collection and analytical outputs, produces highly heterogenous datasets that can complicate the comparison of safety data for different medicinal products. Improving the speed at which safety data become available to healthcare professionals and pregnant women alike, and the ability to compare safety results for different medicinal products, may be achieved through developing approaches to standardise global PregPV data collection and analysis methods.

It is hypothesised that an optimised and standardised approach to data collection could be an important component in decreasing heterogeneity of outcome reporting, thereby improving the confidence in the synthesis of results from different studies in systematic literature reviews and meta-analysis studies. Furthermore, a standardised approach will facilitate the combination of high-quality and detailed data from multiple sources using a common data model approach [55, 56], which in turn would facilitate faster generation of evidence. However, application of the CDE standards will not remove potential data-collection biases inherent to observational study designs, and as such, the associated methodological limitations will need to be continually considered during data synthesis and interpretation stages. Medicines regulatory authorities may also benefit from this standardised reference framework of CDEs, which they can promote to researchers and the pharmaceutical industry when undertaking primary source PregPV studies. Finally, the CDE recommendations may be useful to apply to spontaneous reporting data collection frameworks, which currently lack dedicated fields for pregnancy cases. This could be achieved through updates to the ICH-E2BR3 data standards [57], with these CDE recommendations providing a valuable basis for the development of pregnancy optimised clinical trial databases. This would align with the current work plan and recognition of the importance of including pregnant women in clinical trials [22, 53, 58].

Efforts to provide core data recommendations for general PregPV have previously been undertaken [33, 34, 59]. A core dataset for prospective pregnancy registries specifically investigating inflammatory rheumatic diseases was published in 2020 [60]. However, these recommendations only included high-level suggestions, lacking definitions, data structure recommendations, and explanations of their purpose. A systematic review published in 2017 did not identify any core dataset recommendations relating to PregPV [61]. We believe that the CDE recommendations presented here are the first multi-expert, rigorous, consensus-based set of recommendations for general PregPV to provide detailed definitions and examples of data structure, source, and purpose for each of the data elements. The primary aim of these recommendations was to provide PregPV researchers with a reference framework to aid the development of new datasets using primary source data collection methodologies. There is no expectation that researchers with existing PregPV primary source datasets will amend or redevelop their datasets; however, investigations about alignment with the recommended definitions and/or the potential to transform data for synthesis of results is encouraged.

The development of these recommendations involved a robust scoping review of published guidelines and data dictionaries of established PregPV data collection systems. Experts in PregPV undertook extensive discussions and consideration of the validity and necessity of each data element identified. In comparison with previously published guidance for PregPV [33, 34], the main advantage of these CDE recommendations involves the provision of detailed definitions, and descriptions of data structures, source, and purpose for each of the data elements. This information provides researchers with a sufficient level of detail to develop new PregPV data collection systems, or may aid the standardisation and harmonisation of existing datasets. Although the list of CDEs included in the tables is large (94 elements), the recommendations provide the framework for a robust and complete collection of the data required for an optimal PregPV assessment/investigation. These recommendations also recognise that pregnancy exposures require longer-term follow-up of live born offspring to assess childhood health and neurodevelopmental outcomes. Data elements are described that are valuable for immediate pregnancy, neonatal, and infant outcomes and also for longer-term childhood/adolescent health and developmental outcomes. It is accepted that the collection of longer-term outcomes may be limited in some settings, but efforts must be made to extend investigations routinely to include these important outcomes. A key benefit of primary source data collection involves the potential to capture a rich dataset containing details that cannot be routinely captured using secondary source methodologies (such as population-based healthcare registries). The collection of such complete datasets thereby can allow for the more accurate assessment of medication safety in pregnancy, through the consideration of important co-variable risk factors. To aid researchers in identifying which CDEs are essential for basic evaluation of medication use in pregnancy, a description is provided for how essential each element is. It is important to note that missing data for essential items may not mean that the case cannot be used in data analysis; however, careful assessment of the impact of missing data for those elements marked essential is advised.

There are several important limitations. Primarily, the definitions provided, data formats recommended, and the assessments as to whether elements should be considered essential are reflective of consensus agreements and compromise. Some of these compromises reflect what experts in the working group considered to be achievable with data collection, rather than what would be optimal. Secondly, it is well-established that the severity or activity of maternal medical conditions during or prior to pregnancy should be considered during data analysis. However, due to the multiplicity of possible medical conditions and measurements of disease activity/severity, it was not considered feasible to provide recommendations around the collection of these data. Researchers investigating specific-disease areas must adapt the recommendations to collect this detail in collaboration with disease specialists. Finally, local adaptations to data formats may be required, and additional data derivation steps may be needed to ensure these data can be standardised/harmonised for potential combination with data collected in other locations.

Further evaluation of the IMI ConcePTION CDE recommendations is currently underway. In this evaluation, multiple PregPV data providers from academia and industry participating in the ConcePTION project are attempting to assess whether a subset of their existing prospective datasets match the definitions provided in the CDE recommendations. Future refinement and additional developments of the CDE recommendations will be considered, dependent on the results of the evaluation. It is currently planned that members of the ENTIS will maintain the recommendations and be responsible for keeping the recommendations aligned with current PregPV processes and guidance. It is also planned that the CDE recommendations will be submitted to the EMA for endorsement consideration through their Innovation Task Force and Qualification Advice procedures. Additionally, technical proposals for the structure of a dataset utilising the CDEs detailed in these recommendations, which could be used as the basis for a multi-site common data model, have not yet been undertaken. Such development, which could utilise existing data transfer systems such as ICH E2B(R3) [62] optimised for the inclusion of PregPV CDE items, should be considered in a further step to aid data harmonisation and speed up global data collection efforts.

5 Conclusion

With this set of CDE recommendations, we aim to standardise global PregPV data collection processes to improve the quality and speed at which evidence-based information can be provided about the risks and benefits of medication use in pregnancy. Future wide-scale promotion of the recommendations will be integral to the success of this approach. The collection of user feedback will provide opportunities to improve the CDE framework, allowing the recommendations to evolve and meet the needs of PregPV researchers, and ultimately patients/healthcare professionals.