FormalPara Key Points

There is a lack of validated statistical methods that could help identify subgroups defined by characteristics such as age, sex and underlying conditions, and that might be at increased risk of adverse drug reactions.

We tested one of the few available first-pass screening subgroup methods using a large, diverse dataset of spontaneous adverse event reports (US FDA Adverse Event Reporting System [FAERS]).

Our study showed apparent low concordance between disproportionality scores calculated by subgroup analysis using FAERS and a reference set (European Medicines Agency Pharmacovigilance Risk Assessment Committee discussions of subgroup risk).

Age and sex were better captured within FAERS and showed relatively better concordance among the different covariates tested. Covariates such as pregnancy and underlying condition might benefit from enrichment with additional data sources, such as electronic healthcare records data.

1 Introduction

The response to a drug or vaccine includes both therapeutic effects and potential adverse drug reactions (ADRs); the magnitude of such effects can be highly heterogeneous across patient subgroups [1]. If responses are significantly associated with known subgroup characteristics, such as age, sex or underlying condition, prescribers can use this information to identify individuals who are more likely to experience ADRs, and thus optimise the benefit–risk ratio for a given patient.

Associations between patient characteristics and ADRs are well understood in some cases and can be used to inform clinical decisions. For instance, the drug rasburicase is used to prevent and treat tumour lysis syndrome, which is an oncological emergency in patients with certain solid tumours or haematological malignancies [2]. However, rasburicase is contraindicated in patients with glucose-6-phosphate dehydrogenase (G6PD) deficiency because of an increased risk of haemolysis (rupturing of red blood cells) [2]. Thus, it is recommended that clinicians screen patients at high risk for G6PD deficiency, such as those of African or Mediterranean ancestry [3]. ADRs can also be associated with the sex of the patient. For example, young adult male patients are at increased risk of myocarditis associated with coronavirus disease 2019 (COVID-19) vaccination [4,5,6]. Conversely, female sex has been identified as a risk factor for drug-induced QT prolongation and Torsades de pointes [7,8,9], as well as congenital long-QT syndrome [10]. Additionally, lower doses of the hypnotic agent zolpidem are recommended for women, who eliminate zolpidem more slowly and are more prone to impairment of daytime activities than men [11].

Early in the drug development process, preclinical pharmacodynamic and pharmacokinetic data are used to model the risk of ADRs in human subjects [12,13,14]. The focus shifts to empirical evidence once a drug or vaccine enters human trials [15, 16]. Sometimes, rare but serious ADRs are recognised only after a drug is approved and marketed and appropriate steps are taken to minimise the risk [17].

Spontaneous reporting systems for adverse events (AEs) are the mainstay of postmarketing safety surveillance [18, 19]. Because of the large volume of spontaneous reports, pharmaceutical companies and regulators use quantitative signal detection methodologies, mostly based on disproportionality analysis, to identify potential ADRs, which subsequently undergo a focused clinical review [18, 20,21,22]. Quantitative signal detection is usually broadly applied to AE reporting datasets and often adjusts for potential confounders based on stratification. However, this one-size-fits-all approach for stratification does not account for all the confounding in spontaneous AE reports and can therefore be misleading. It has been demonstrated that subgroup analyses can perform better than methods adjusted by stratification [21, 23,24,25] and potentially address modifying effects that might underly the AE data. Nevertheless, there is currently a lack of systematic subgroup analyses for first-pass screening and, when employed, subgroup analyses are often limited to specific demographic characteristics [26,27,28]. Quantitative screening for a broad range of covariates in this context has recently been proposed [29]. This approach could be burdened by limitations related to the specificity of spontaneous reporting, such as lack of certain data needed to characterise subgroups, non-random reporting of specific elements on the spontaneous report, as well as a low number of AE reports for recently launched products or products with narrow indications or low exposure. However, subgroup analysis could enable safety reviewers to efficiently screen large amounts of data to identify subgroups that may be at greater risk.

In this study, we aimed to examine the extent to which subgroup analysis can serve as a first-pass quantitative signal detection method in screening spontaneous AE reports. We also aimed to examine the potential limitations of spontaneous AE data sources that might influence the ability to identify subgroup statistical signals, such as missing data elements required for subgroup differentiation and subsequently point to ways to improve high-risk subgroup identification. To this end, we compiled a reference set of AEs, which we defined as any AE discussed within the context of differentiated subgroup risk in European Medicines Agency (EMA) Pharmacovigilance Risk Assessment Committee (PRAC) meeting minutes from 2015 to 2019. We then applied a recently published quantitative approach for subgroup analysis [29] across a large and diverse dataset of AE reports and examined whether we could detect the reference set of a priori identified AEs.

2 Methods

2.1 US FDA Adverse Event Reporting System (FAERS) Dataset

US FDA Adverse Event Reporting System (FAERS) data cumulative from 2004 through the second quarter of 2021 were used for the analyses. All records in which the product was reported as suspect or interacting (but not concomitant) were included. The analyses were performed on all events at the Medical Dictionary for Regulatory Activities (MedDRA®; version 24.0) Preferred Term (PT) level and on all mapped products at their active moiety level. The data were standardised and deduplicated. Active moieties were derived, in alignment with the FDA’s definition [30], by Commonwealth Informatics in the same manner as the commercially available signal management platform Commonwealth Vigilance Workbench.

Specifically, drug name cleaning was performed by processing their source values through successive mappings, including uppercasing; removing excess whitespace, quotes, parentheses, trailing periods and commas, outer square brackets, braces, etc.; removing certain literals and variants thereof (e.g. ‘tablet’, ‘caplet’, ‘capsule’, ‘unknown’, ‘formulation’, ‘generic’, ‘nos’, etc.); removing units such as ‘mg’ and ‘milligrams’; and changing backslashes to forward slashes. The adjusted verbatim drug names were then mapped to product active ingredients according to known verbatim-active ingredients mappings. Any remaining unmapped verbatim drug names were assigned to the literal ‘UNMAPPED’ and excluded from the analysis.

Duplicate detection was performed after all other data transformation and standardisation was complete. A large number of candidate duplicate pairs were initially generated based on a set of simple heuristic rules. These candidate pairs were then scored by implementing a quantitative method based on the hit-miss algorithm previously described [31]. Briefly, the method generates a score correlated to the statistical likelihood that two different reports represent two versions of the same underlying case. Pairs with a score above a selected threshold are considered true duplicates. Finally, the individually identified duplicate pairs are ‘coalesced’ into duplicate groups (consisting of two or more case reports) to address multiple duplicates for a given case report.

2.2 Reference Set

2.2.1 Initial Reference Set

The PRAC meeting minutes from 2015 to 2019 were downloaded from the EMA website to extract the reference set of positive controls for this study. PRAC meetings aim to evaluate data from all sources, including spontaneously reported suspected ADRs and results from interventional and observational studies that offer important data for signal detection. The PRAC discusses the prioritisation of emerging safety signals and issues recommendations required for their management, such as further investigation or drug labelling changes [32]. The minutes were reviewed independently by two healthcare professionals to identify any discussion of an AE associated with the use of a drug and the potential for a differentiated risk in particular subgroups. Neither the context of the discussion (e.g. signal detection, signal validation, signal assessment or hypothesis testing) nor the trigger (e.g. case reports, clinical trials or epidemiological studies) were considered for the purpose of identifying these subgroup examples.

Only subgroups corresponding to those defined by Sandberg et al. [29] (Table 1) were considered for inclusion in the subgroup analyses. Subgroups mentioning products in development or vaccines were excluded from the reference set as they are rarely listed in FAERS. The included subgroups will be referred to as PRAC subgroup examples.

Table 1 Covariates and corresponding subgroups described by Sandberg et al., and subsets included in our analysis

2.2.2 Mapping of Drugs, Events and Subgroups

The drugs and events discussed in the PRAC subgroup examples did not fully correspond to the drug and medical ontologies used to code the FAERS data. For instance, a group of drugs rather than a specific drug may have been discussed or generic medical nomenclature rather than specific MedDRA® terms may have been used to describe events in PRAC examples. Additionally, the subgroups discussed by PRAC may not be readily identifiable in the AE reporting data set (e.g. pregnancy).

Where needed, events described in PRAC subgroup examples were independently mapped to MedDRA® PTs available in FAERS by two drug safety experts. The two mappings were then jointly reviewed by the experts and consensus was reached.

Where needed, drugs described in PRAC subgroup examples were independently mapped to the active moieties available in FAERS by two drug safety experts. As for events, the two mappings were then jointly reviewed by the experts and consensus was reached.

PRAC subgroups were mapped to subgroups defined by Sandberg et al. [29]. Because the raw narratives were not available in the FAERS data used for this study, identifying cases for the pregnancy subgroup was challenging. Therefore, a slightly adapted algorithm was needed. Upper case MedDRA® PTs that included the substrings of ‘PREGN’ or ‘GESTAT’ or ‘GRAVID’ or ‘MATERN’ or ‘LABOUR’ and excluded terms such as ‘Pregnancy test negative’, ‘Pregnancy test false positive’ and ‘Pregnancy test urine negative’ were used to identify potential pregnancy cases. Additionally, the reported case must have concerned a woman between the age of 15 and 44 years. However, if the MedDRA® PT fell under the MedDRA® High Level Term ‘Unintended pregnancies’, these were excluded. This algorithm was used only to identify cases for the pregnancy subgroup and not to identify pregnancy-related AEs.

2.3 Subgroup Analysis

Subgroup disproportionality scores were computed on the overall FAERS data using the method described by Sandberg et al. [29]. This method is based on the Information Component, which is the binary logarithm of a shrunk disproportionality data mining algorithm comparing the observed (O) number of reports for a given drug–event combination (DEC) with an expected (E) number of reports estimated from the overall database, and is fully described elsewhere [33]. Briefly, subgroup disproportionality scores were obtained by restricting the O/E ratio computation to the subgroups of interest. No combinations of the subgroup covariates were considered and no further adjustment was performed within the subgroups. Bayesian credibility intervals were computed and the lower limit of 95% credibility intervals was used to set the threshold for signal detection. For subgroup analyses, broader credibility intervals were used [29] compared with the intervals reported by Norén et al. [33] to control for the rate of spurious associations due to multiple comparisons [34]. The requirements reported by Sandberg et al. were used to identify subgroup signals, followed by sensitivity analyses using algorithm adaptations (Table 2). The analyses were run in Azure Databricks using PySpark.

Table 2 Requirements to identify disproportionately reported subgroup DECs

2.4 Assessment of Concordance

Concordance was determined at two different levels: at the subgroup example level, requiring just one of the PRAC subgroup DECs to be detected in FAERS to consider the subgroup example detected; and at the subgroup DEC level, assessing for each PRAC subgroup DEC whether it was detected in FAERS or not. Because the PRAC examples included combinations of covariates (i.e. age and underlying condition, sex and underlying condition, sex and age) but no combinations were considered for the subgroup analysis in FAERS, these examples were considered independently for each covariate. For instance, an example representing age and underlying condition was tested once for age and once for underlying condition.

3 Results

3.1 Reference Set

Review of the PRAC meeting minutes from 2015 to 2019 allowed retrieval of 52 subgroup examples (Fig. 1). Four PRAC subgroup examples that mentioned drugs pertaining to classes or events with different aetiology or biological mechanisms were split further, leading to the addition of seven PRAC subgroup examples and bringing the total to 59 examples.

Fig. 1
figure 1

PRAC subgroup examples (attrition diagram). FAERS Food and Drug Administration Adverse Event Reporting System, PRAC Pharmacovigilance Risk Assessment Committee

One PRAC subgroup example was excluded because it described an AE following vaccination. Twenty-four PRAC subgroup examples were excluded from the analysis, 8 because of duplicates and 16 because the subgroups mentioned in the minutes were not considered by Sandberg et al. [29], i.e. AE in offspring from exposure during pregnancy or AE resulting from concomitant use of another substance (Fig. 1). All drugs discussed in the included PRAC subgroup examples are approved for similar indications in the EU and US. After the mapping to FAERS active moieties and MedDRA® PTs, seven PRAC subgroup examples were excluded because the corresponding subgroup DECs were not reported in the FAERS database. Six of the seven were reported as DECs but not for the subgroup described in the PRAC minutes, and one was not reported as a DEC but was reported independently as a drug and an event.

Eventually, 27 PRAC subgroup examples were included in the analyses (Table 3). The drugs, medical concepts and subgroups described in these 27 examples were mapped to FAERS active moieties, MedDRA® PTs and subgroups as described in Sect. 2.2.2, resulting in 1719 subgroup DECs included in the analysis (Online Resource Table 1).

Table 3 Covariate subgroups identified in PRAC meeting minutes

The 27 PRAC subgroup examples were dominated by age and underlying condition, with underlying condition less dominant when considering the number of subgroup DECs (Table 3). These included subgroup examples that were triggered from case reports, clinical trials and epidemiological studies (Fig. 2). Age was mentioned in 22 PRAC subgroup examples (1028 subgroup DECs), with only 11 exclusively about age (Table 4). Ten PRAC subgroup examples referred to underlying conditions (385 subgroup DECs), but only one exclusively mentioned underlying condition. Of the six PRAC examples mentioning sex as a subgroup (305 subgroup DECs), only two exclusively mentioned sex. Pregnancy was described in only one PRAC example (and only one subgroup DEC) that focused on exposure and harm to the mother. None of the PRAC subgroup examples specifically described countries or regions or referred to body mass index (BMI). Therefore, no country, region or BMI subgroupings were included in the present analysis. However, these covariates were, to some extent, available in FAERS. Country was reported in 98% of reports, and regions could be derived from countries. BMI, although not readily available in FAERS, could be approximated by the weight that was reported in 21% of reports. Height was not available in the version of FAERS used for this analysis but would be available in other spontaneous report systems.

Fig. 2
figure 2

Distribution of PRAC subgroup examples per trigger category

Table 4 Individual subgroup combinations identified in PRAC meeting minutes

3.2 Availability of Subgroups in FAERS

Age was reported in 58% of FAERS cases, with adult categories being predominant (Online Resource Table 2). There were more than 5×106 subgroup DECs in FAERS for age, for approximately 7.5×106 AE reports with age known. Sex was provided in 89% of FAERS reports (Online Resource Table 2), with women representing 61% of reports where sex was known. Sex corresponded to more than 4×106 subgroup DECs in FAERS, with a total of approximately 11.6×106 reports where sex is known. The underlying condition was approximated by all-drugs indication reported in the cases and at least one drug indication was reported in 88% of FAERS cases, of which 77% reported only one drug indication (Online Resource Table 2). There were more than 19×106 drug indication subgroup DECs in FAERS, for a total of 11.5×106 reports with at least one drug indication. Finally, pregnancy cases represented 0.5% of FAERS reports (Online Resource Table 2) and translated to 129,826 subgroup DECs in FAERS.

3.3 Concordance Between FAERS Subgroup Signals and Pharmacovigilance Risk Assessment Committee (PRAC) Subgroup Examples

3.3.1 Subgroup Example Level

Overall, 2 of the total 27 PRAC subgroup examples (7%) were detected in FAERS when applying the Sandberg subgroup methodology (Table 3). Looking at each covariate, 1 of the 22 (5%) and 1 of the 6 (17%) PRAC subgroup examples for age and sex, respectively, were detected. For underlying condition and pregnancy, none of the 10 and none of the 1 PRAC subgroup examples, respectively, were detected. When relaxing the requirement for subgroup signals to not be disproportionately reported in the entire database (i.e. removal of the requirement that IC025 for the entire database ≤ 0), 14 of the 27 (52%) PRAC subgroup examples were detected, with 10 of 22 (45%), 5 of 6 (83%), 1 of 10 (10%) and 1 of 1 (100%) examples detected for age, sex, underlying condition and pregnancy, respectively (Table 3). This suggests that when a subgroup example from the PRAC discussions was a subgroup statistical signal, it could have initially been identified by an overall statistical signal, at the DEC level, and then the stratum-specific effect identified as a second step. The detected examples included subgroups triggered from case reports, clinical trials and epidemiological studies (Fig. 2). None of the other adjustments to the requirements reported by Sandberg et al. [29] for sensitivity analyses had a significant impact on detection (Table 3).

3.3.2 Subgroup Drug–Event Combination (DEC) Level

When moving from the subgroup example level to the subgroup DEC level, only 2 of the 1719 PRAC subgroup DECs (0.1%) were detected by applying the Sandberg subgroup methodology [29] to FAERS data (Table 3). One of 1028 (0.1%) and 1 of 305 (0.3%) subgroup DECs for age and sex, respectively, were detected. None of the 385 and none of the 1 PRAC subgroup DECs were detected for underlying condition and pregnancy, respectively. Increased concordance was achieved when a subgroup signal was considered detected regardless of whether it was disproportionately reported in the entire database (i.e. removal of the requirement that IC025 for the entire database ≤ 0). In this case, 170 of the 1719 PRAC subgroup DECs (10%) were detected, but at the cost of generating more subgroup signals (increased from 193,656 to 1,312,922). When relating to patient’s age, sex, underlying condition, and pregnancy status, 69 of 1028 (7%), 75 of 305 (25%), 25 of 385 (6%), and 1 of 1 (100%) subgroup DECs were detected, respectively. The poorer performance at this level compared with the subgroup example level is explained by the low specificity of the PRAC subgroup discussions that led to dilution of the effect across the many MedDRA® PTs and drug active moieties, and, subsequently, decreased detection power.

3.4 Post Hoc Sensitivity Analysis

3.4.1 Restriction of PRAC Subgroup Examples

Some of the PRAC subgroup examples could not be rigorously analysed in the same way as the subgroup statistical alerts for which the Sandberg methodology was proposed. Hypothesizing that excluding such subgroups might improve the performance, a post hoc sensitivity analysis was performed in which the following PRAC subgroup examples were excluded.

  • Any PRAC example where only one subgroup had the ability to be exposed to the drug or to experience the event (e.g. risk of developing ovarian macrocysts in women exposed to mitotane). Despite subgroup analyses being more appropriate in these situations, it cannot be argued that there is a differentiated risk in the subgroups of the same covariate.

  • Any PRAC example where the subgroups were conditional, i.e. they involved two different covariates (Table 4; e.g. adults [age] with pulmonary hypertension associated with idiopathic interstitial pneumonia [underlying condition] experiencing an increased risk of mortality when exposed to riociguat). Given that combinations of covariates were not considered in the study by Sandberg et al. [29], these were assessed independently for each covariate, thereby applying an approach using ‘OR’ instead of ‘AND’; however, this led to considerable deviation from the subgroup signals initially discussed in the PRAC meeting minutes.

3.4.2 Results at Subgroup Example Level

Applying these additional exclusion criteria, our sample of PRAC subgroup examples was reduced from 27 to a smaller sample size of 11. Of those 11, one (9%) was detected using the Sandberg methodology. When considered regardless of whether they were disproportionately reported in the entire database, two examples (18%) were detected as subgroup signals. These represented 2 of 10 PRAC examples (20%) for age and none for the one underlying condition PRAC example. The main reasons that nine subgroup examples were not detected by the latter approach were that the number of observed cases for the subgroup DECs was small and with broad credibility intervals (7 of 9), the subgroup DECs were not reported more than expected (1 of 9) and the O/E ratios for the subgroups were not substantially different from the O/E ratios for the remainder of the database (i.e. ICΔ ≤0; 1 of 9).

3.4.3 Results at Subgroup DEC Level

The remaining 11 PRAC subgroup examples represented only 70 subgroup DECs. One of 70 (1%) PRAC subgroup DECs was detected in FAERS using the Sandberg method [29]. When examined regardless of whether they were disproportionately reported in the entire database, four subgroup DECs were detected, all of which related to age (4 of 69 [6%]).

4 Discussion

Subgroup analyses can be of vital importance in postmarketing safety surveillance to identify subgroups at higher risk of developing specific ADRs. Currently, both a widely accepted gold standard to assess quantitative signal detection methods [35] and systematic assessment of the extent to which quantitative data mining on spontaneous reports correlates with subgroup safety risk differences are lacking. In this study, we applied a recently published method [29] that describes first-pass screening subgroup analysis for a variety of risk factors, to a large AE dataset. To test this methodology, FAERS data were selected because they include more than 13 × 106 reports, are public domain, are widely used for method testing and contain a diverse set of medications, albeit not vaccines. In the absence of any gold-standard reference set for the subgroup analyses, the PRAC subgroup examples were selected as a reference set. They were chosen because they are externally recognised, are in the public domain and are not reliant on spontaneous reporting. They constitute a valuable independent reference set of safety concerns that warrant discussion by a regulatory body, regardless of future labelling status. To our knowledge, this is the first study to evaluate the Sandberg subgroup method [29] and report on its ability to detect subgroups of potential increased risk across a large, diverse dataset. Our analysis demonstrated that the subgroup methodology detected PRAC subgroup examples in FAERS with a low sensitivity (7% at subgroup example level and 0.1% at subgroup DEC level).

Removing the requirement of the Sandberg methodology for signals to not be disproportionately reported overall, not only improved the sensitivity (from 7 to 52% at the subgroup example level and from 0.1 to 10% at the subgroup DEC level) but also generated more subgroup signals from FAERS data. It resulted in improved sensitivity for age and sex (detection of 45% and 83% at the subgroup example level and 7% and 25% at the subgroup DEC level, respectively). However, it should be noted that those signals would have been identified as DECs by routine disproportionality analysis and subsequently used by safety reviewers to identify subgroups disproportionately reported and potentially responsible for the overall disproportionality. Eighty-one percent of the DECs detected by this adapted subgroup methodology would have been detected by routine overall disproportionality analysis. Conversely, 57% of DECs detected by routine disproportionality analysis would also be detected by the adapted subgroup methodology. We also assessed the sensitivity after excluding PRAC examples with combinations of covariates or where only one subgroup had the ability to be exposed to the drug or to experience the event in a post hoc analysis. PRAC examples for age were mainly included and the sensitivity at the subgroup example level was reduced to 20% for age and 18% overall. After reviewing the outputs of the post hoc analysis, the low sensitivity observed was mostly attributed to the small sample size of observed cases in FAERS and the resulting broad credibility intervals.

Candore et al. [20] assessed several overall disproportionality methods using various spontaneous reporting systems and showed a sensitivity ranging from 19 to 46% and a positive predictive value from 10 to 21%. In this study, the sensitivity ranged from 0.1 to 52%, therefore sensitivities were similar to overall disproportionality analyses. Positive predictive value could not be calculated because our reference set did not include the exhaustive list of positive controls but is likely very low. It should be noted that the reference set of positive controls used by Candore et al. [20] and the one used in this study are very different.

The decision of how to group or split covariates into subgroups may affect the analysis. For example, age subgroups defined by Sandberg et al. [29] did not always match with the age subgroups mentioned in PRAC examples, potentially diluting the disproportionality. In addition, not combining covariates, when combinations were present in 59% of PRAC examples, ignores the fact that the modifying effect of one covariate may differ by subgroups of the other covariate. A scan test (or more advanced machine learning techniques) could be used to handle these limitations by assessing all meaningful combinations while controlling for multiple testing and not having to define the subgroups a priori.

There are several limitations to our study that should be considered when interpreting concordance. First, the PRAC minutes may not use product active moieties and event MedDRA® PTs to represent drug exposure and adverse events. Translating these to standardised dictionaries with a different granularity, such as MedDRA®, may allow for variability in results. After the mapping, the majority of PRAC subgroup examples concerned a range of active moieties, subgroups and MedDRA® PTs. These multiple entries for drug, event and subgroup may reduce the detection power by resulting in more subgroup DECs with fewer data, or diluting the effect by mixing subgroups with high exposure effect with subgroups with low exposure effect. This might explain the observed discrepancy between sensitivity at the subgroup example level versus the subgroup DEC level. Furthermore, the PRAC subgroup examples included in this study represent a sample of only 5 years, therefore the reference set used is not comprehensive and specificity could not be assessed. Moreover, any mention of subgroups that might be at greater risk of developing a particular AE after exposure to a given drug was included in our reference set regardless of whether it was validated. Sensitivity to validated subgroup signals may differ from sensitivity to the reference set we used in our experiment. To some extent, the overall low concordance observed might also be explained by the fact that subgroups discussed by PRAC are based on various data and methods, whereas the method used in this analysis is purely quantitative and does not account for qualitative aspects that are not readily available in structured databases. Additional work would be needed to understand whether traditional methods as used by PRAC could be complemented by quantitative subgroup disproportionality analyses. Additionally, the number of our PRAC examples was small and weighted towards specific subgroups tested (age and underlying condition). On the other hand, the use of FAERS data, which mainly cover the US population (whereas PRAC is a European committee that likely uses European data), may have impacted concordance. However, none of the products from the PRAC subgroup examples are exclusively marketed in the European Union, and although healthcare provision and usage might differ, this is unlikely to result in highly different subgroup categorisation in the two geographical regions.

Another limitation resides in the use of spontaneous data. Non-random reporting patterns at the case level, and also at the case attribute level, impact what data are listed or missing on a case report and the way they are recorded. This non-random recording of data in spontaneous AE reports may make it particularly challenging to conduct quantitative analysis of spontaneous data. Subgroups such as underlying condition and pregnancy are captured sporadically and unsystematically in spontaneous AE data, therefore imposing limitations on subgroup analyses. In our study, when the data were considered regardless of disproportionately reported in the entire database, the sensitivity for underlying condition was low (10% at subgroup example level and 6% at the subgroup DEC level). The sensitivity for pregnancy was 100% but accounted for only one subgroup example/DEC, for an event that only pregnant women can experience (gestational diabetes). Consequently, this example was excluded from the post hoc analysis. Although it was attempted to minimise missing data (e.g. by using indications of drugs to determine underlying condition), the alternative information required was also frequently missing or could introduce bias into the analysis (e.g. due to certain indications of concomitant drugs being more frequently reported than others). In addition, identification of the subgroup of pregnant cases relied on an algorithm based on structured fields and coded events because we could not access free-text fields in FAERS. This limited our capacity to identify pregnant cases and resulted in a low number of such cases. Sandberg et al. [29] did not consider concomitant medication or exposure in pregnancy with the risk in offspring, therefore we excluded 4 and 12 such examples, respectively. Nevertheless, these data might also pertain to this category of covariates, i.e. sporadically and unsystematically reported, and are therefore difficult to assess in spontaneous data because of reporting biases and missing mother–child linkage. Electronic health care records data in these situations could provide additional insights. A logical next step could be to assess whether performance is improved for these covariates when enriching spontaneous data with these relevant observational data [36].

In light of these limitations, we would recommend to not consider subgroups meeting one of the following criteria for subgroup analyses in spontaneous reports data.

  • Timebound subgroups (e.g. definitive overlap of exposures for a specific period of time, such as with drug–drug interactions, exposure to a concomitant drug within 60 days of occurrence of an event following another drug exposure, exposure to a drug for at least 1 year). The dates and times are not reliable and are often missing in spontaneous reports data, rendering temporal relationships between subgroup elements difficult to establish.

  • Conditional subgroups (e.g. patients with a history of a particular event or patients who take a particular concomitant drug). The rationale for exclusion is that reporting of medical or medication history and concomitant drugs in spontaneous reports data is very sporadic and heavily biased.

  • Combination subgroups (e.g. a female under the age of 20 years). Although certain characteristics (e.g. age and sex) are more commonly reported in spontaneous reports data, combining these would potentiate the variability of the results based on the sporadic and non-random reporting of these data elements.

  • Subgroups normally missing from spontaneous reports data (e.g. genetic risk factors). The rationale for exclusion is that if the data element is not populated on the database, then it cannot be used to determine allocation to subgroups.

  • Subgroups requiring linkage to other records (e.g. in utero exposure and fetal adverse events). There are very few reports of linkage of mother–child records with robust data in spontaneous safety databases.

Alternative approaches have been proposed in the literature. Giangreco and Tatonetti proposed a subgroup method within the paediatric population [37] using a generalised additive model (GAM) approach, more technical than simple proportional reporting ratios. Nonetheless, their results do not convincingly suggest that the GAM approach performed significantly better than proportional reporting ratios. In another study, Chandak and Tatonetti created matched cohorts for sex, which could be used to identify differential effects in sex subgroups [38]. They generated propensity scores (PSs) for women then used them to create PS-matched cohorts of men and women, and subsequently evaluated all drug AEs in both cohorts. While an independent PS model could be created for sex regardless of the drug/AE investigated, it may be driven by factors that are good predictors of sex but have no effect on the risk of the ADR or on the probability of being exposed to the drug, preventing a good adjustment for confounding factors.

5 Conclusions

Overall, we noted apparent low concordance between the Sandberg method applied in a large ADR database and a reference set of PRAC meeting subgroup examples, especially when used as first-pass screening. The performance was improved for variables that are better captured in spontaneous report data, namely age and sex, but covariates such as underlying condition and pregnancy likely require enrichment with alternative data sources. While we have offered some suggestions for future approaches to improve subgroup analyses, further research is needed to assess the optimal combination of data sources, individual characteristics, reference set and statistical methods and thresholds needed to screen subgroups that might be at high risk of ADRs. Ultimately, the nature of spontaneous reports and the application of quantitative approaches, rather than the specific use of subgroup analyses, seemed to limit the ability to identify issues discussed in a regulatory context. Thus, progress to an increasingly personalised view of predictive safety will require a multimodal data approach.