Introduction

Determining the factors that influence the female response to exertional heat stress is not new (Nunneley 1978; Stephenson and Kolka 1993), although different research approaches have been employed. One approach compares differences in the group mean with that of an intervention or other matched group when all characteristics apart from the one under investigation are standardized (Gagnon and Kenny 2012; Charkoudian and Stachenfeld 2014). Another approach considers the relative contribution of independent variables in explaining a dependent variable from individual responses of a (usually larger) heterogenous sample, seen as a better representation of the population distribution (Foster et al. 2020). Concerning the latter, previous studies (Havenith et al. 1998; Notley et al. 2019) with the largest number of recreationally active women (n = 36 and 43, respectively) have sought to determine thermoregulatory responses to low–moderate fixed-intensity cycle ergometry for 30- to 60-min bouts measured in a range of ambient conditions (from temperate to warm–humid and hot–dry). Both studies used regression analysis to determine which morphological (body mass, surface area and % fat, etc.), physiological (metabolic rate or heat production, whole-body or local sweat rates, etc.), functional (aerobic fitness and power) and environmental (ambient temperature and absolute humidity) factors explained the variance in the women’s’ core temperature (Tcore) response. Results indicated that the strength of the relationships and variance explained (10–59%) was dependent on the heat load, i.e., combined exercise intensity and ambient thermal profile of the trials (Havenith et al. 1998; Notley et al. 2019). While these important results are valid for occupational and leisure-time physical activity completed at a low–moderate intensity (or metabolic rates), they are unlikely to be representative of or applicable to aerobically trained women undertaking such activities at higher intensities for a number of reasons.

Firstly, metabolic heat production in trained women at these higher intensities is likely double the values previously examined in the literature, i.e., metabolic rates of 148–389 vs. 464–716 W·m−2 (Lei et al. 2019; Notley et al. 2019), while trained women have a greater capacity to deal with a heat load on account of their enhanced heat loss effectors (Kuwahara et al. 2005). Next, these previous studies have not reported or accounted for differences in thermoregulation secondary to fluctuations in the primary ovarian steroids (E2 and P4), whereby generally speaking E2 promotes heat dissipation and lowers Tcore, while P4 has the opposite effect (Charkoudian and Stachenfeld 2014). This is important to consider as this may differ from less trained counterparts (Kuwahara et al. 2005) and has been shown to contribute to the variance in Tcore at rest (Lei et al. 2017). Finally, the nature of a fixed-intensity protocol denies the user of behavioral thermoregulation (Schlader et al. 2011a), thereby ignoring the fundamental premise that heat loss needs only to equal heat production (Nielsen 1938) and is considered to be less ecologically valid (than self-pacing) for most leisure-time and occupational physical activity apart from few, i.e., forced marching.

The purpose of the current paper was to determine the relative contribution of the E2 and P4 alongside other morphological, physiological, functional and environmental factors in explaining the individual variation in trained women when considering the core temperature response (peak Tcore, [Tpeak]) and work output (mean power output) with very high metabolic rates. To achieve this, we retrospectively analyzed results from 36 trained women completing a self-paced 30-min work trial that has been shown to be unaffected by ovulatory status, ambient environment and pre-load/warm-up duration (Zheng et al. 2021b). Participants were distinguished by intra-participant (i.e., early follicular and mid-luteal phases) or inter-participant (i.e., ovulatory vs. anovulatory vs. oral contraceptive pill [OCP] user) differences in their endogenous E2 and P4 concentrations. We hypothesized that in addition to previously identified factors such as body mass, aerobic fitness and metabolic heat production (Havenith et al. 1998; Notley et al. 2019), the ovarian hormones would contribute significantly toward the variance explained in Tcore during exercise.

Methods

This paper combines data from three separate experiments (Lei et al. 2017, 2019; Zheng et al. 2021a), which included n = 28 ovulatory and OCP-user female cyclists/triathletes and adds to this new data of the n = 8 participants that did not complete all trials or were excluded from the final analyses on account of being deemed anovulatory (Lei et al. 2017; Zheng et al. 2021a). Interested readers are directed to these studies for further methodological details and results.

Ethical approval

All original studies (Lei et al. 2017, 2019; Zheng et al. 2021a) had received approval by the Massey University Human Ethics Committee (Southern A) and were performed in accordance with the latest revision of the Declaration of Helsinki, except for registration in a database. Informed, written consent was obtained from all participants prior to their participation.

Participants

Thirty-six aerobically trained women participated, yielding 115 separate trials (n = 23 completed 4 trials, n = 10 completed 2 trials, n = 3 completed 1 trial, see Fig. 1). Their physical characteristics are displayed in Table 1. Inclusion criteria were that participants were healthy non-smokers not taking any regular medication (apart from those using the OCP), cycling regularly (≥ 3 days per week) with a maximal aerobic capacity (VO2max) ≥ 40 ml·kg−1·min−1. Exclusion criteria included any cardiovascular, metabolic, neurological and respiratory diseases. All eumenorrheic women self-reported a regular menstrual cycle 21–35 days in length (≥ 3 month) with no use of hormonal contraception (≥ 6 mo). All OCP women were taking a monophasic combination OCP (≥ 1 year) with experimental visits completed during the 3 weeks of active pill use (see Lei et al. 2019 for further details).

Fig. 1
figure 1

Diagram of experimental overview. Ovulatory (OVU), anovulatory (ANO) and oral contraceptive pill (OCP) users performed trials in their (quasi-) early follicular (EF) and/or mid-luteal (ML) phases in warm–dry (DRY) and/or warm–humid (HUM) environmental heat. n = 23 completed four trials and n = 10 completed two trials, whereas n = 3 completed only one trial due to scheduling difficulties and dropout

Table 1 Participant characteristics for ovulatory (OVU), anovulatory (ANO) and oral contraceptive pill (OCP) groups

Ovulatory status and ambient conditions

Eumenorrheic women were tested on days 3–6 (EF) and 18–21 (ML) following the start of menses, while OCP women were tested on days 3–6 and 18–21 following the start of active OCP use. Our rationale for comparing EF and ML was based on maximizing the differences in E2 and P4 occurring naturally, permitting comparison with/expansion beyond previous results, and that ovulatory women are in EF and ML for ~ 50% of their reproductive lives. Although this approach represents the phases of lowest hormone exposure and peak P4, it does not include for comparison the late-follicular/pre-ovulatory phase. Although the late-follicular/pre-ovulatory phase captures when E2 peaks, the duration of < 72 h makes it difficult to perform repeated tests (such as this study) and comprises a much smaller proportion of the reproductive life for these women. Testing for eumenorrheic women was scheduled using the three-step method (Allen et al. 2016) whereby self-reported menses onset and urinary luteinizing hormone testing (EasyCheck® Ovulation Test, Phoenix Medcare Ltd, Auckland, New Zealand) prospectively identified EF and ML, while measurement of serum 17β-estradiol (E2) and P4 retrospectively confirmed ML. A P4 level of > 5 ng·ml−1 is good evidence that ovulation has occurred (Leiva et al. 2015; Schaumberg et al. 2017; Scheid and De Souza 2010). Therefore, participants were deemed as ovulatory (OVU, > 5 ng·ml−1) or anovulatory (ANO, < 5 ng·ml−1) as detection of a urinary luteinizing hormone surge (alone) cannot confirm luteal phase sufficiency (Scheid and De Souza 2010). Ambient conditions were distinguished by vapor pressure, such that the following characterized each environment: DRY (2.2 ± 0.2 kPa, 34.1 ± 0.2 °C, 41.4 ± 3.4% RH, wet-bulb globe temperature: 27.0 ± 0.5 °C) and HUM (3.4 ± 0.1 kPa, 30.2 ± 1.2 °C, 79.8 ± 3.7% RH, wet-bulb globe temperature: 28.2 ± 0.8 °C).

Experimental overview

All data were collected outside of the Southern Hemisphere summer (March–November) where the average daily temperature did not exceed 22 °C, nor had participants spent any time in a warmer climate for at least 1 month prior to the study. All participants attended the laboratory on the following occasions: (1) preliminary submaximal and maximal aerobic capacity test, (2) experimental familiarization and (3) experimental trials. For an overview of the experimental design see Fig. 1. The experimental trials consisted of the following factors: (quasi-) menstrual phase (early follicular [EF, 56 trials] and mid-luteal [ML, 59 trials]) and ambient profile (warm–-humid [HUM, 69 trials] and warm–dry [DRY, 46 trials]). The order of the trials was randomized and counterbalanced except the order of the ambient profile was consistent in different (quasi-) phases within participants. Experimental trials were conducted at the same time of the morning (± 1 h) and following > 24 h of dietary and exercise control. Each trial consisted of either 12 or 20 min of fixed-intensity pre-load that was kept consistent within participants, immediately followed by a 30 min of self-paced work trial where only percentage of time elapsed (every 20% or 6 min) was provided to the participant. All exercise was performed on an electronically braked cycle ergometer (Lode Excalibur, Groningen, The Netherlands), with handlebars, seat height and pedal preference standardized according to individual preference. The typical timeline for a participant to complete this study resulted in preliminary testing and familiarization separated by 3–7 days during the (quasi-) follicular phase, with half of the participants starting their experimental trials the following (quasi-) luteal phase (i.e., 14 days later) and the other half the following (quasi-) follicular phase (i.e., 28 days later), with within-phase experimental trials differing by ambient profile separated by 3 days.

Preliminary testing and familiarization

All preliminary testing was conducted in the (quasi-) EF phase of each participant’s menstrual cycle to minimize the potential effects of menstrual/OCP cycle on their physiological and performance responses during the tests (Sims and Heather 2018). Following anthropometric measurements (height, weight, body composition), a 24-min steady-state submaximal cycle ergometer test was conducted in a temperate laboratory environment (18–22 °C) with a fan-generated airflow of 19 km·h−1 facing participants. The submaximal cycle test consisted of four consecutive 6-min stages with power outputs of 100 W, 125 W, 150 W and 175 W at comfortable, but constant cadence. O2 consumption was measured during the last 2 min of each stage. Following 10-min rest from the submaximal test, a VO2max cycle ergometry test was performed. The initial workload began at 100 W and increased by 25 W every minute, until volitional exhaustion. The exercise intensity during the self-paced exercise was based on 75% of an individual’s VO2max, which was derived from the linear relationship between the power output and the O2 consumption during both the steady-state submaximal exercise test and maximal aerobic capacity test. Following at least 24 h rest from the preliminary session, a familiarization trial was conducted to ensure all participants were familiar with the testing procedures and to minimize the learning effect during trials. This trial was replicated entirely during the experimental trials outlined below.

Dietary and exercise control

Diet and physical activity during the 48 h prior to the first experimental trial were recorded and participants were instructed to repeat these for the following experimental trials. The day of and prior to any experimental trial was marked by abstinence from alcohol, exercise and only habitual caffeine use (as abstinence would confound results from withdrawal effects). This dietary and exercise control minimized variation in pre-trial metabolic state. Fluid intake was encouraged to ensure a euhydrated state.

Experimental procedure

These trials were conducted in the same environmental chamber with a fan-generated airflow of 19 km·h−1. Upon their arrival at the laboratory, participants voided, producing a urine sample to confirm a urine specific gravity < 1.020 to ensure adequate hydration (Sawka et al. 2007). Following this, nude body weight was recorded and participants self-inserted a rectal thermistor 12 cm beyond their anal sphincter. A blood sample was obtained from an antecubital vein after participants had rested seated for 15 min. Participants entered the environmental chamber wearing only cycling shorts and top, shoes and socks. Participants rested seated on the ergometer for 20 min during which they were instrumented, and baseline measurements were recorded. They then completed either i) 6 min of cycling at each of 125 and 150 W (62 ± 9 and 73 ± 10% VO2max, respectively, 92 trials) or ii) 10 min of cycling at each of 100 and 125 W (56 ± 8 and 68 ± 10% VO2max, respectively, 23 trials); notably, where participants completed multiple trials, the warm-up duration was kept constant. Physiological measurements taken during the final 2 min of each intensity included expired gas and rating of perceived exertion RPE, while rectal temperature (Trec) was measured continuously. Immediately on completion of the second fixed-intensity bout, the ergometer was set to linear mode based on the formula of Jeukendrup et al. (1996), where participants were instructed to perform as much work as possible over 30 min. During this 30-min self-paced period, work completed (kJ) and RPE were recorded every 6 min, while Trec was measured continuously and tap water at 20 °C was provided to drink ad libitum throughout to minimize dehydration. Total work completed (kJ) was used as criterion measure for performance, although this was expressed as mean power output for the trial to allow wider application. After the completion of the 30-min self-paced exercise, the participant towel dried and recorded nude body weight.

Measurements

Results reported in the current study were those for which a maximal number of measures were recorded for the n = 36. For interested readers, other physiological (i.e., thermoregulatory, cardiovascular, inflammatory) and reliability measurements were performed during these trials and can be found in our separate studies (Lei et al. 2017, 2019; Zheng et al. 2021a, 2021b).

Anthropometric

Participant height and weight were measured using a stadiometer (Seca, Germany; accurate to 0.1 cm) and scale (Jadever, Taiwan; accurate to 0.01 kg), from which surface area (AD) was estimated (Du Bois and Du Bois 1916). Body composition was measured using multi-frequency bioelectrical impedance analysis (InBody 230, Korea) using a standard procedure (Kyle et al. 2004).

Respiratory

Expired respiratory gases were collected from a mixing chamber and analyzed for O2 consumption using an online, breath-by-breath system (VacuMed Vista,Turbofit, Ventura, CA, USA) using a 30-s average. This system was calibrated before each trial using a zero and β-standard gas concentrations, and volume (VacuMed 3L Calibration Syringe).

Body temperature and sweat loss

Tcore was indexed from Trec measured with a rectal thermistor (Covidien Mon-a-Therm, USA; accurate to 0.1 °C) and recorded continuously using TracerDAQ software (Measurement Computing Corporation, Norton, MA, USA). Whole-body sweat rate (WBSR) was estimated from nude body mass loss, corrected for fluid consumed and time.

Hormones

Venous blood was collected by venipuncture into a vacutainer (Becton–Dickinson, Oxford, UK) containing clot activator and once clotted (> 30 min) the whole blood was centrifuged at 4 °C and 805g for 15 min and aliquots of serum were transferred into Eppendorf tubes (Genuine Axygen Quality, USA) and stored at − 80 °C until further analysis. Serum samples were analyzed using enzyme-linked immune assays for E2 (Demeditec Diagnostics, Kiel, Germany) and P4 (IBL International, Hamburg, Germany) with a sensitivity of 6.2 pg·ml−1 and 0.045 ng·ml−1, respectively, and an intra-assay variation of < 6 and < 7%, respectively.

Perceived exertion

RPE was measured using the 15-grade scale, from 6 to 20 (Borg 1970).

Data and statistical analyses

The dependent variables were mean power output and Tpeak. The independent variables included: age, mass, AD, mass:AD, % body fat, aerobic fitness, peak aerobic power, training history, E2, and P4, P4:E2, Tcore at baseline (Tbase), Tcore at start of work trial (T0), WBSR, vapor pressure and power output.

All statistical analyses were performed with SPSS software for Windows (IBM SPSS Statistics 25, NY, USA). Descriptive values were obtained and reported as means and standard deviation (± SD). Data were checked for normality by calculating skewness and kurtosis, whereby values within ± 2 were deemed to be acceptable (Weir and Vincent 2021). Participant characteristics were analyzed using one-way ANOVA and Student’s t test. Correlation coefficients were calculated to reveal the direction and strength of any potential relationships between variables; Pearson’s correlation coefficient and Spearman's rho were determined for data that did or did not (E2, P4, P4:E2) follow a normal distribution, respectively. Finally, in line with and to allow comparison to previous research (Havenith et al. 1998; Notley et al. 2019), stepwise linear regression was used to explain the variance of the dependent variables. A total of 104 (Tpeak) and 103 (power output) cases were included for the regression (not 115, due to missing E2, P4 and sweat rate data), where data that did not follow a normal distribution (E2, P4, P4:E2) were log-transformed before entering. Independent variables were only included in the final models if their tolerance value was > 0.5 to avoid unacceptable collinearity between predictors. Data were screened for influential cases using Cook’s distances, leverage values and standardized residuals. Test assumptions for normality, linearity and homoscedasticity were determined by scatter and residual plots. Since some participants completed repeated trials, residuals from each final regression model were tested for serial correlation using the Durbin–Watson test, whereby a value between 1.5 and 2.5 was deemed acceptable (Durbin and Watson 1950). Statistical significance was set at p ≤ 0.05.

Results

As can be seen from Table 2, a wide range of intra- and inter-participant endogenous concentrations in E2 and P4 was evident. By contrast, other dependent and independent variables displayed far less variability between participants, (quasi-) menstrual phases and ambient environments (Table 3).

Table 2 Participant hormone concentrations
Table 3 Descriptive statistics for dependent and independent variables

T peak

Correlation coefficients between the independent variables and Tpeak measured during the 30-min work trial can be seen in Fig. 2 (left panel). Factors included in the regression analysis to explain the variance in Tpeak were AD:mass, log(E2), T0 and power output. The decision to enter AD:mass was made as it is a function of both individual factors and that it provided the strongest correlation to Tpeak, while T0 (but not Tbase) was entered to reduce collinearity and because it provided far stronger correlation to Tpeak. The resulting model can be seen in Table 4, with no evidence of serial correlation in the model (2.15), and very high tolerance values indicating acceptable collinearity and model stability. Variables that were excluded from the models were AD:mass (β = 0.08, p = 0.26). Overall, the model was able to account for 60% of the variance in Tpeak, with T0 the largest contributing variable (Fig. 2, right panel). It is noteworthy that the resulting model remained unchanged even when the omitted variables (AD, mass and Tbase) were included a posteriori, supporting the decision process.

Fig. 2
figure 2

a Bivariate associations between independent variables and peak Tcore (Tpeak) on all common data points. *p < 0.05. b The percentage of explained and unexplained (residual) variance (\(\overline{R }\)2) for explaining Tpeak

Table 4 Multiple regression models for explaining the core temperature response (Tpeak) and performance (mean power output)

Power output

Correlation coefficients between the independent variables and mean power output achieved during the 30-min work trial can be seen in Fig. 4 (left panel). Factors included in the regression analysis to explain the variance in power output were AD, VO2max, PPO, training history, WBSR and RPE. The resulting model can be seen in Table 4, with no evidence of serial correlation in the model (1.86), and very high tolerance values indicating acceptable collinearity and model stability. Variables that were excluded from the models were AD (β = − 0.03, p = 0.72), VO2max (β = 0.16, p = 0.11), training history (β = 0.09, p = 0.22), and WBSR (β = 0.10, p = 0.24). Overall, the model was able to account for 44% of the variance in power output, with peak aerobic power the largest contributing variable (Fig. 4, right panel).

Discussion

The current study fills an important gap in the literature that describes a woman’s vulnerability to exertional heat stress in this literature. Namely, it is the first study to determine the relative contribution of independent variables (individual factors) in explaining the core temperature response to exertional heat stress in women at very high metabolic rates, and when accounting for the inter- and intra- variation in ovarian hormone concentrations (cf. Havenith et al. 1998; Notley et al. 2019). In partial support of our hypothesis, we observed that E2 contributes a small amount toward the core temperature response (Tpeak), whereby starting core temperature and power output (≈metabolic heat production) explained the greatest variance.

In the current study, E2 was positively associated with Tpeak, although it was only able to explain ≤ 4% of its variance (Fig. 2, Table 4). This seemingly contradicts other research (Charkoudian and Stachenfeld 2014) and is inconsistent with our previous findings. A subset of these results (Lei et al. 2019) showed that the OCP group had attenuated heat loss mechanisms (↑ forearm vascular resistance, ↓ forearm blood flow, local and whole body sweat rates) compared to their matched eumenorrheic counterparts, concurrent with lower concentrations of E2 (19 ± 26 vs. 78 ± 65 pg·ml−1; p < 0.01; Cohen’s d = 1.2), although these differences were insufficient to change Tcore. Furthermore, despite no change in endogenous E2 and P4, the OCP group still demonstrated a consistent and significant increase in resting and exercising Tcore during their quasi-ML compared to EF (Lei et al. (2019). Using the current analysis (and design), it is difficult to determine whether it is the intra-participant or inter-participant E2 driving this relation (or both, Table 2, Fig. 3). Similarly, what modulating effect P4 might be contributing is unclear and is probably best explored using different methods, e.g., use of progestin-only OCP or temporary suppression of the menstrual cycle with a gonadotropin releasing hormone (ant)agonist (Charkoudian and Stachenfeld 2014). A confounding factor in this analysis may be that the group with the lowest concentrations of E2 was younger and had a lower training history (Table 1). Aerobic training, independent of aerobic fitness (VO2max), has been shown to improve Tcore and heat loss responses in both men (Ravanelli et al. 2021) and women (Ichinose et al. 2009) synonymous with phenotypic heat adaptation. Clearly, further research on this topic is necessary in additional cohorts (e.g., ages and training status); nevertheless, the effect of E2 on Tpeak was still considerably less than that of starting Tcore and power output.

Fig. 3
figure 3

Bivariate associations between peak core temperature (Tpeak) during exercise and core temperature at the start of the work trial (T0; top row, n = 115); between Tpeak and mean power output during the work trial (middle row, n = 115); between Tpeak and E2 concentration measured before exercise (bottom row, n = 104). Values are all common individual data points, analyzed using Pearson’s correlation coefficient and Spearman’s rho, respectively

That T0 was able to explain ~ 40% of the Tcore response should reinforce for women what is already known and practiced for men with regard to heat-specific interventions; namely, trained women should focus and prioritize interventions (e.g., aerobic training, active heat adaptation, pre-exercise cooling, fluid ingestion etc.) that effectively lower Tcore before competition, attenuate the rise in Tcore during or (perhaps) extend Tcore at the end of exercise in order to improve work output (Alhadad et al. 2019). Moreover, power output explained ~ 15% of the Tcore response, which reaffirms the contribution of metabolic heat production (Nielsen 1938; Notley et al. 2019). This highlights the role that behavioral thermoregulation (self-pacing) plays during exercise in the heat by being able to reduce metabolic heat production, thereby improving heat exchange with the environment to decrease thermoregulatory strain, something that a fixed-intensity protocol does not permit (Schlader et al. 2011a, b, c).

Few studies have previously quantified contributors to aerobic performance during self-paced exercise in the heat; to the authors’ knowledge, this is the first study to do so using women. The single greatest contributor toward work output (performance) was a participant’s peak aerobic power (Fig. 4, Table 4). These results support those of James et al. (2017) who demonstrated that velocity at VO2max (i.e., PPO) was the strongest predictor of 5-km running performance in the heat in men. Thus, the results of the current study and James et al. (2017) concur with a recent meta-analysis (Alhadad et al. 2017) that placed aerobic training as the single greatest factor for determining endurance performance in the heat, above heat acclimation, pre-exercise cooling and fluid ingestion, something that athletes and practitioners should consider.

Fig. 4
figure 4

a Bivariate associations between independent variables and mean power output on all common data points. *p < 0.05. b The percentage of explained and unexplained (residual) variance (\(\overline{R }\)2) for explaining mean power output

Notable differences between our results and those previously (Havenith et al. 1998; Notley et al. 2019) include: (i) anthropometric factors such as body mass and AD (or composite, mass:AD) did not contribute toward variance explained in Tpeak despite significant correlations (Fig. 2); (ii) the functional factor of VO2max did not contribute toward variance explained in Tpeak (Fig. 2), and although it correlated with power output, it did not contribute toward variance explained (Fig. 4); (iii) the environmental factor of vapor pressure did not contribute toward variances explained (Figs. 2 and 4). As already mentioned, we believe these differences to be likely a function of the different sample training status and protocol used (intensity and self-pacing). However, it is also acknowledged that like other retrospective analyses of existing datasets (Havenith et al. 1998; Notley et al. 2019), the current analysis has certain limits. Our primary focus was whether and by how much the Tcore response to exertional heat stress in women can be explained by accounting for the variation in ovarian hormone concentrations. To maximize predictive/explanatory power, we chose to include all factors into one model each for power output and Tpeak, i.e., by not separately grouping by vapor pressure, pre-load duration, etc. Thus, due to our partially nested design, we cannot be certain of the independent effect of these variables. Nevertheless, if we were to take by example the dependent and independent variables with greatest explanatory power (Tpeak, power output, T0, RPE) and compare between vapor pressures and pre-load duration, no differences are found (all p > 0.21). Furthermore, were the factor of vapor pressure to exert an effect, then this should be evident as a positive (Tpeak) or negative (power output) correlation, which is not evident in our results (Figs. 2 and 4). Moreover, it is noteworthy that the resulting models (± 1–6%) and predictors remain largely unchanged if vapor pressure and pre-load were separated.

Considerations

The observations herein are valid only for the current sample(s), protocol(s) and condition(s), and inference of association does not imply causation. It is regrettable that measurement of autonomic thermoeffectors and thermodynamic data were not collected in ~ 40% of the sample, which may have strengthened the results. Our decision to use Tpeak as our primary dependent variable was guided by the fact that (i) ethics committees and professional bodies use absolute, not relative, thresholds for Tcore in their guidelines and policies; ii) not all participants reached their highest Tcore at the end of exercise due to the self-paced nature of the protocol. However, a posteriori re-analysis of our data for ∆ Tcore did not change any of the significant independent variables. While it may be tempting to interpret the results as E2 having a negligible influence on Tcore/Tpeak, it is worthwhile considering that as an individual factor E2 did contribute a small amount toward the variance explained for Tpeak, whereas AD:mass did not, a variable that has previously been shown to have one of the largest effects (Havenith et al. 1998). Finally, our data should not be generalized to other OCP formulations (e.g., triphasic combination and progestin-only) or to the late-follicular/pre-ovulatory phase of a menstrual cycle.

Perspectives and significance

Women remain underrepresented in the exercise thermoregulation literature and > 70% of studies still do not report ovulatory status or menstrual phase (Hutchins et al. 2021). Ovulatory status should not inhibit inclusion into this research topic (Schaumberg et al. 2017; Zheng et al. 2021b) although, importantly, the current results support calls for future measurement and consideration of ovarian hormone concentrations being standard (Elliott-Sale et al. 2021). Individualization of human thermoregulation models improves the prediction of heat strain, largely through an increase in the number of input parameters (Havenith 2001). The current results suggest an additional factor (E2) might be considered in future work, although data saturation has not been reached. Similarly, Flouris et al. (2018) have identified simple metrics that can successfully be used as screening criteria to prospectively identify individuals at greater risk of acute exertional heat stress. Flouris et al. (2018) argue health professionals and occupational management to (re)consider whether different criteria for women should be utilized on account of their unique body morphology/physiology, something the current results support.