Introduction

Anabolic-androgenic steroids (AAS) are a family of hormones comprising the androgen hormone testosterone as well as its synthetic derivatives (Kanayama and Pope 2018). Use of AAS was historically associated with weightlifters and later with professional bodybuilders and elite athletes in various sports. Since the 1980s, use of AAS has gradually spread to recreational athletes as well as the general population (Pope and Kanayama 2012). Use of AAS normally comprises long-term administration of supraphysiological doses often 10–100 times the natural production or therapeutic doses of androgens (Kanayama et al. 2013). A meta-analysis on the global prevalence of AAS use indicated that 3.3% of the world’s population has used AAS at least once with use being more frequent among males (6.4%) compared to (1.6%) females (Sagoe et al. 2014b; Sagoe and Pallesen 2018).

Despite benefits such as increased muscle growth, improved body image, and enhanced sports performance (Evans 2004; Sagoe et al. 2014a; Smit et al. 2020a), human case studies, surveys, and experimental studies suggest that AAS induce a plethora of physical and psychological adverse side effects. Cardiovascular disorders, particularly cardiomyopathy, are major physical side effects of AAS use (Baggish et al. 2017). Other somatic side effects of AAS include hypertension, sleep abnormalities, immunological dysregulation, decreased libido in males, and hirsutism and clitoromegaly in females (Bensoussan and Anderson 2019; Ganesan et al. 2020). Notable psychological side effects comprise manic and depressive symptoms as well as psychotic symptoms (Brower 2009; Kanayama et al. 2020). Human case studies, surveys, and experimental studies further suggest that AAS induce a plethora of symptoms such as irritability and unprovoked aggression sometimes referred to as “roid rage” or “steroid rage” (Nelson 1989; Pope and Katz 1987; Taylor 1987; Tragger 1988). Experimental animal studies show consistently that injections of AAS increase aggression (Clark and Henderson 2003; Lumia et al. 1994). For human studies, cross-sectional (Ganson and Cadet 2019; Pereira et al. 2019), case-control (Klötz et al. 2007; Lundholm et al. 2010; Thiblin et al. 2015), and longitudinal (Beaver et al. 2008) researches indicate a positive relationship between AAS use and aggression. However, results from human placebo-controlled randomized studies show an inconsistent association between AAS administration and aggression comprising negative (Björkqvist et al. 1994), positive (Panagiotidis et al. 2017; Wagels et al. 2018), and non-significant findings (Tricker et al. 1996).

Most previous reviews on this topic are merely narrative (Haug et al. 2004; Huo et al. 2016; Johnson et al. 2013). Additionally, a recent review (Geniole et al. 2020) on this topic lacks some studies (Anderson et al. 1992; Björkqvist et al. 1994; Su et al. 1993; Tricker et al. 1996). Hence, a comprehensive systematic review quantifying findings on the topic is overdue in line with the merit of meta-analyses in science and evidence-based medicine (Murad et al. 2016). Against this backdrop, we conducted a systematic review and meta-analysis of randomized controlled trials (RCTs) examining the effect of AAS administration on self-reported as well as observer-reported aggression in healthy males.

Methods

Literature search strategy

Systematic literature searches were conducted in MEDLINE, PsycInfo, ISI Web of Science, ProQuest, Google Scholar, and Cochrane Library. There was no time constraint for the search. Keywords for AAS were combined with keywords for aggression. An overview of the keywords and search strategy can be found in Appendix A in the Supplementary information. The latest systematic literature search was conducted on 31 December 2019 followed by additional ad hoc searches to ensure comprehensiveness. The search and selection process are presented in Fig. 1.

Fig. 1
figure 1

PRISMA-style flow diagram of the study selection process

Inclusion criteria and data extraction

Included studies were as follows: (1) RCTs, (2) investigating the effects of AAS administration on aggression in healthy persons, (3) based on valid aggression measures, and (4) published in English. The first author (RC) independently conducted the search and selection of articles based on the aforementioned criteria. Using a standardized data extraction form, the first and last (RC and DS) authors independently extracted the following data from the identified studies: study authors, country, design (e.g., double-blind), sample type (e.g., healthy males), sample size, age (range, M ± SD), study groups (e.g., placebo group), AAS type, AAS dose, AAS administration mode (e.g., injection), study duration, assessment type (e.g., self-report), aggression measure, results, and risk of bias (see Table 1). Furthermore, the testosterone levels both at baseline and post-administration for each study are shown in Table 2. The two authors reached consensus in cases of discrepant extractions through discussions, with the involvement of the second author SP) when necessary. We also contacted corresponding authors or, when unavailable, coauthors via email for missing information.

Table 1 Characteristics of randomized controlled trials on the effects of AAS administration on aggression in healthy persons
Table 2 Mean baseline and post-administration levels of placebo and testosterone for each study (nmol/L)

Statistical analysis

We first investigated the overall effect of AAS administration on self-reported aggression using a random-effects model. AAS users typically administer supraphysiologic doses of AAS for 4 to 28 weeks (Kanayama et al. 2013; Copeland et al. 2000). We therefore subsequently pooled studies in which higher doses (over 500 mg) of AAS were administered for the examination of the effect of high-dose AAS administration on self-reported aggression (O’Connor et al. 2004; Pope et al. 2000; Su et al. 1993; Tricker et al. 1996; Yates et al. 1999). Furthermore, we pooled studies in which AAS were administered over longer periods (i.e., 3 days to 14 weeks: Anderson et al. 1992; Cueva et al. 2017; O’Connor et al. 2002; O’Connor et al. 2004; Pope et al. 2000; Su et al. 1993; Yates et al. 1999) as well as studies investigating acute AAS effects (Carré et al. 2017; Dreher et al. 2016; Panagiotidis et al. 2017; Tricker et al. 1996). Due to the low number of studies administering higher doses (k = 5) or investigating acute AAS effects (k = 4), a fixed-effect model was used for these analyses (Borenstein 2009). Moreover, we conducted a meta-regression analysis to elucidate a potential dose-response association, regressing AAS dose (mg) on self-reported aggression. Finally, we investigated the overall effect of AAS administration on observer-reported aggression using a fixed-effect model due to the low number of studies (k = 3: O’Connor et al. 2004; Tricker et al. 1996; Yates et al. 1999).

Some studies used multiple aggression measures and reported multiple aggression scores (O’Connor et al. 2002, 2004; Panagiotidis et al. 2017; Pope et al. 2000; Su et al. 1993). In these cases, we set the correlation between aggression measures to 0.60 (Diamond and Magaletta 2006; O’Connor et al. 2001) to provide the best estimates of between-study variance and corresponding confidence intervals (Gleser and Olkin 2009; Marín-Martínez and Sánchez-Meca 1999). For crossover studies (O’Connor et al. 2004; Pope et al. 2000; Su et al. 1993; Yates et al. 1999), we used an average correlation of 0.50 between aggression measures over time to provide optimal effect size estimates (Krahé and Möller 2010). Effects were estimated as Hedges’ g, where 0.20 is considered small, 0.50 moderate, and 0.80 as large effect sizes, respectively (Hedges and Olkin 2014). For studies including a passive control group (e.g., no intervention), a placebo group, and a treatment group (Björkqvist et al. 1994), data from the placebo and treatment groups were used to estimate meaningful relative-effect estimates (Karlsson and Bergmark 2015; Magill and Longabaugh 2013). Effect sizes were calculated by pooling post-intervention mean and standard deviations of aggression scores. When mean and standard deviation were not reported or unavailable in the original paper, authors were approached by email (Björkqvist et al. 1994), and asked to provide statistical information (i.e., F and p values) necessary to calculate effect sizes. For the assessment of heterogeneity, we used the Q-statistic and the I2 index. The latter indicates the proportion of the observed variance that reflects real differences in effect size. It is expressed as a percentage (0–100) with 0% indicating no heterogeneity, 25% indicating low heterogeneity, 50% indicating moderate heterogeneity, and 75% suggesting high heterogeneity (Higgins et al. 2003) respectively. Additionally, we used Duval and Tweedie’s (2000) trim and fill method, and Orwin’s (1983) fail-safe N to assess publication bias. The trim and fill method (Duval and Tweedie 2000) screens for missing studies and adjusts the effect size by trimming the asymmetric studies and filling a funnel plot symmetrically. Orwin’s (1983) fail-safe N quantifies the number of studies required to bring the observed effect size down to a chosen “trivial” estimate (Hedges and Olkin 2014). In the current meta-analysis, we set the “trivial” estimate to g of 0.05.

The quality of each included study was assessed using the Cochrane risk of bias tool (Higgins et al. 2003). The protocol for the meta-analysis was pre-registered in PROSPERO (CRD 42019117834). The literature search, coding of variables, and reporting were conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) procedure (Moher et al. 2009). The meta-analysis and the meta-regression were performed using the Comprehensive Meta-Analysis version 3.3.070 (Borenstein et al. 2014).

Results

Literature screening and selection

From an initial pool of 30,407 hits, 18,988 records remained after removal of duplicates (k = 3772) and gray literature (k = 7649) during initial identification and screening. Of this pool, 18,752 were removed after eligibility screening by title and abstract leaving 238 records for further evaluation. After screening the 238 full-text records, 12 studies were finally included. Figure 1 presents the literature search and selection process.

Description of included studies

Of the twelve included studies, publication year ranged from 1992 (Anderson et al. 1992) to 2017 (Carré et al. 2017; Cueva et al. 2017; Panagiotidis et al. 2017). Four of the studies were conducted in the USA (Pope et al. 2000; Su et al. 1993; Tricker et al. 1996; Yates et al. 1999), four in the UK (Anderson et al. 1992; Cueva et al. 2017; O’Connor et al. 2002, 2004), and one each in Germany (Panagiotidis et al. 2017), Finland (Björkqvist et al. 1994), Ireland (Dreher et al. 2016), and Canada (Carré et al. 2017). We received clarification and data from some authors (Björkqvist et al. 1994; Carré et al. 2017; Cueva et al. 2017; Dreher et al. 2016; O’Connor et al. 2004). (See Table 1.)

All the included studies comprised placebo-controlled randomized trials. One of the included studies was single-blinded (Anderson et al. 1992) and 11 were double-blinded. Additionally, six studies were crossover studies (Anderson et al. 1992; Cueva et al. 2017; O’Connor et al. 2004; Pope et al. 2000; Su et al. 1993; Yates et al. 1999) whereas five were based on a between-subject design (Björkqvist et al. 1994; Carré et al. 2017; Dreher et al. 2016; O’Connor et al. 2002; Panagiotidis et al. 2017; Tricker et al. 1996). The studies included a total of 562 healthy male (females: n = 0) participants. Participants’ ages ranged from 18 (Su et al. 1993) to 49 (Carré et al. 2017) with a grand mean of 25.83 (SD = 3.80).

Testosterone enanthate was administered in four studies (Anderson et al. 1992; Dreher et al. 2016; O’Connor et al. 2002; Tricker et al. 1996) and two studies administered testosterone cypionate (Pope et al. 2000; Yates et al. 1999). In addition, two studies administered testosterone undecanoate (Björkqvist et al. 1994; O’Connor et al. 2004), and three studies administered testosterone gel (Carré et al. 2017; Cueva et al. 2017; Panagiotidis et al. 2017) whereas one study administered methyltestosterone (Su et al. 1993). AAS doses ranged from a one-time application of 50 mg of testosterone gel (Panagiotidis et al. 2017) to a one-time injection of 1000 mg of testosterone undecanoate (O’Connor et al. 2004), and a cumulative injection of 7000 mg of testosterone cypionate over a 14-week period (Yates et al. 1999). When various doses of AAS were used in one study, we used results from the highest dose for calculating the effect size.

Aggression was assessed by self-reports (Anderson et al. 1992; Björkqvist et al. 1994; Carré et al. 2017; Cueva et al. 2017; Dreher et al. 2016; O’Connor et al. 2002, 2004; Panagiotidis et al. 2017; Pope et al. 2000; Su et al. 1993; Tricker et al. 1996; Yates et al. 1999), observer-reports (O’Connor et al. 2004; Tricker et al. 1996; Su et al. 1993; Yates et al. 1999), and behavioral aggression measures (Carré et al. 2017; Pope et al. 2000). The Buss-Perry Aggression Questionnaire (Buss and Perry 1992) was used in three studies (O’Connor et al. 2002, 2004; Pope et al. 2000), and three studies (O’Connor et al. 2002, 2004; Yates et al. 1999) used the Buss-Durkee Hostility Inventory (Buss and Durkee 1957), two studies (Carré et al. 2017; Pope et al. 2000) used the Point Subtraction Aggression Paradigm (Cherek et al. 1996), and three studies (Dreher et al. 2016; O’Connor et al. 2002, 2004) used the Profile of Mood States (McNair et al. 1992) with two out of these three studies (O’Connor et al. 2002, 2004) additionally using the Aggression Provocation Questionnaire (O’Connor et al. 2001).

Additionally, the Self-Estimated Mood Checklist (Lindman 1985) was used in one study (Björkqvist et al. 1994), and one study (Panagiotidis et al. 2017) used the Technical Provocation Paradigm (Panagiotidis et al. 2017) and emotional self-ratings (Schneider et al. 1994). Moreover, two studies (Cueva et al. 2017; Su et al. 1993) used visual analogue scales (Cline et al. 1992; Norris 1971), one study (Tricker et al. 1996) used the Multi-Dimensional Anger Inventory (Siegel 1986), and one study (Anderson et al. 1992) used daily ratings of irritability, readiness to fight, and being easily angered. 10 studies (Anderson et al. 1992; Carré et al. 2017; Cueva et al. 2017; Dreher et al. 2016; O’Connor et al. 2002, 2004; Panagiotidis et al. 2017; Pope et al. 2000; Tricker et al. 1996; Yates et al. 1999) reported no significant effect of AAS administration on aggression. In addition, one study (Su et al. 1993) found a positive effect of AAS administration on aggression (p < .05), whereas one study (Björkqvist et al. 1994) reported a negative effect of AAS administration on aggression (p < .01).

Risk of bias

The two authors disagreed once on the random sequence generation dimension for all the included studies yielding a Cohen’s kappa of .58 (Cohen 1988). All studies were evaluated as having a high selection bias as there was no description of the randomization method or concealed allocation process. In addition, all studies were evaluated as having high risks of performance and detection bias as the effectiveness of blinding was not tested. Moreover, all studies had a low risk of attrition bias as there was sufficient reporting and handling of attrition and exclusion. Furthermore, except for one study that did not present means and standard deviations or inferential indices (Björkqvist et al. 1994), we evaluated all studies as having low reporting bias. Figure 2 depicts the risk of bias of the included studies.

Fig. 2
figure 2

Estimated risk of bias of the included studies

Effect of AAS administration on self-reported aggression

Of the twelve included studies, one study (Björkqvist et al. 1994) did not overlap with the 95% CI of the overall pooled effect size. Exclusion of this outlier resulted in a mean and significant random-effects size of g = 0.171 (95% CI: 0.029–0.312, k = 11, p = .018), and there was no significant heterogeneity between the included studies (I2 = 0.000, Q = 8.891, p = .542). The effect sizes and associated 95% confidence intervals are presented in Fig. 3.

Fig. 3
figure 3

The effect (random-effects model) of AAS administration on self-reported aggression

The overall random-effects of AAS administration on self-reported aggression, including the outlier (Björkqvist et al. 1994), was not significant (g = 0.081, 95% CI: −0.111–0.273, p = .408). (See Supplementary Figure 1.) When adjusting for publication bias using Duval and Tweedie’s trim and fill method, the overall result (k = 12) turned out non-significant (g = 0.170, 95% CI: 0.029–0.312, p = .890). (See Supplementary Figure 2.) Results from Orwin’s fail-safe N analysis indicated that 27 studies with an effect size of zero would be needed to bring Hedges’ g below 0.05.

Effect of long-term AAS administration on self-reported aggression

The random-effects of administering AAS over longer periods (3 days to 14 weeks) on self-reported aggression under a random-effects model was g = 0.100 (95% CI:−0.079–0.278, p = .273). There was no significant heterogeneity across studies in terms of effect sizes (I2 = 5.286, Q = 6.335, p = .321). (See Fig. 4.)

Fig. 4
figure 4

The effect (random-effects model) of administering AAS over longer periods on self-reported aggression

Effect of acute AAS administration on self-reported aggression

Under a fixed-effect model, the effect of acute administration of AAS on self-reported aggression was g = 0.291 (95% CI: 0.014–0.524, p = .014, Q =.867, p = .833 ). (See Fig. 5.)

Fig. 5
figure 5

The effect (fixed-effect model) of acute AAS administration on self-reported aggression

Effect of AAS dose on self-reported aggression

AAS dose (mg) was not associated with self-reported aggression in a random-effects meta-regression model (B = 0.000, SE = 0.000 (95% CI: −0.000–0.000), p = .096).

Effect of high-dose AAS administration on self-reported aggression

The mean effect of higher doses (over 500 mg) of AAS on self-reported aggression under a fixed-effect model was non-significant (g = 0.191; 95% CI: −0.007–0.388, p = .059, Q = 1.399, p = .844). (See Fig. 6.)

Fig. 6
figure 6

The effect (fixed-effect model) of administering higher (over 500 mg) doses of AAS on self-reported aggression

Effect of AAS administration on observer-reported aggression

The overall fixed-effect of AAS administration on aggression based on observer ratings resulted in an effect size of g = 0.157 (95% CI: −0.026–0.581, p = .469, Q = .249, p = .833). The effect sizes and associated 95% confidence intervals for each study are presented in Fig. 7.

Fig. 7
figure 7

The effect (fixed-effect model) of AAS administration on observer-reported aggression

Discussion

The present systematic review and meta-analysis of eleven studies (Anderson et al. 1992; Carré et al. 2017; Cueva et al. 2017; Dreher et al. 2016; O’Connor et al. 2002, 2004; Panagiotidis et al. 2017; Pope et al. 2000; Su et al. 1993; Tricker et al. 1996; Yates et al. 1999), after excluding an outlier (Björkqvist et al. 1994), indicates that AAS administration is associated with an increase in self-reported aggression, albeit small, among healthy males in RCTs. This finding is consistent with the results of a recent meta-analysis (Geniole et al. 2020) indicating that testosterone administration has a small and positive correlation with aggression in males. Relatedly, our finding that acute AAS administration has a positive effect on self-reported aggression is consistent with evidence that acute increases in testosterone have a positive correlation with aggression (Geniole et al. 2020).

The present study is the first comprehensive systematic review and meta-analytic investigation of the effect of AAS administration and aggression in healthy males in RCTs. However, our results should be interpreted with caution. Firstly, a meta-regression examining dosage as a moderator of the identified effect of AAS on self-reported aggression turned out not significant. Similarly, we did neither detect an effect of AAS administration on observer-reported aggression nor for the effects of long-term (3 days to 14 weeks) and high-dose AAS administration on self-reported aggression. Also, as noted previously, only healthy males were examined in the included RCTs and the duration and doses used in the twelve RCTs deviate from the prolonged use of high-dose cycles consisting of the ingestion of supraphysiologic doses of different types of AAS per week over several months (Kanayama et al. 2013) often reported by users in ecologically valid settings. In one study, the reported weekly AAS dose ranged from 125 to 7000 (mean = 1278) mg per week over an average of 9.1 years (Bjørnebekk et al. 2017). In another recent study, it was shown that an AAS cycle usually comprises the ingestion of five different AAS with an average dose of 901 mg per week for a typical duration of 13 weeks (Smit et al. 2020b). In the present meta-analysis, the highest dose administered was a one-time injection of 1000 mg of testosterone undecanoate (O’Connor et al. 2004) and a cumulative injection of 7000 mg of testosterone cypionate over a 14-week period (Yates et al. 1999). Inferably, AAS doses and duration of administration in the RCTs included in our meta-analysis are far lower than the actual doses reported by AAS users (Bjørnebekk et al. 2017; Kanayama et al. 2013).

Similarly, besides the administration of methyltestosterone in one study (Su et al. 1993), fluoxymesterone, oxymetholone, and trenbolone that are anecdotally associated with increased aggression in humans (Barker 1987; Llewellyn 2011) were not administered in the RCTs included in the present review. Moreover, testosterone undecanoate administered in two studies (Björkqvist et al. 1994; O’Connor et al. 2004) is a depot with a very gradual decay and long half-life leading to relatively stable testosterone levels over a prolonged period of time (Hirschhäuser et al. 1975). Hence, discrepancies in AAS doses, type, duration of use, and half-life between the AAS in the RCTs and naturalistic contexts should be noted when interpreting our findings.

In addition, evidence from cross-sectional studies indicates that polypharmacy and stacking (Sagoe et al. 2015; Salinas et al. 2019) may account for increased aggression among AAS users (Lundholm et al. 2015). The absence of polypharmacy in the RCTs included in our meta-analysis may also explain the discrepancy between findings from RCTs and those reported in more ecologically valid contexts. Other potential confounding factors include small sample sizes and lack of a priori power analyses, diversity in aggression measures, risk of bias (selection, performance, and detection biases), diversity in route of administrating AAS (injecting, transdermally), diversity in time gap between AAS administration, incomplete data reporting, and sampling of only males in included RCTs.

Moreover, the inclusion of only healthy volunteers in the RCTs may have precluded vulnerable subjects from participating which may have led to the underestimation of the effects of AAS administration on aggression. Sampling is important with evidence that testosterone increases aggression in men with certain personality profiles especially among those with fewer cytosine-adenine-guanine repeats in exon 1 of the androgen receptor gene (Geniole et al. 2019). The importance of sampling is further evidenced in that, apart from bodybuilders and competitive athletes, a large portion of non-experimental research linking AAS use with aggression has been conducted among subgroups associated with aggression such as drug users, offenders, and prisoners (Lundholm et al. 2010; Pope et al. 1996), as well as policemen, doormen, and nightclub bouncers (Hoberman 2017; Midgley et al. 2001). Future researchers considering the aforementioned factors may conduct more ecologically valid RCTs (e.g., by using dosages and duration of use similar to those by real AAS users) to better elucidate the effect of AAS administration on aggression in humans. Furthermore, more studies should explore factors of AAS administration (e.g., type of AAS, duration of use, premorbid functioning, and genetics) that might moderate the effects of AAS on aggression.

Conclusions

The present systematic review and meta-analysis provide evidence for an increase, although small, in self-reported aggression in healthy males following AAS administration in RCTs. Moreover, when restricting the analysis to the effects of acute AAS administration on self-reported aggression, we found a significant effect. We also identified important limitations of the RCTs on issues such as non-ecological doses, lack of personality and polypharmacy controls, small sample sizes, risk of bias, short study duration, and the inclusion of only healthy males. While future RCTs adjusting for the above factors may contribute better to contemporary understanding of the effect of AAS administration on aggression in humans, the present study provides an important foundation for addressing this important public health issue. As the appreciation of the heterogeneity of AAS use matures, there is a need to identify the role that AAS plays in aggression and violence and what may be attributed to the set and setting of their use.