Introduction

The hypothalamic–pituitary–adrenal (HPA) axis, regulated by the corticotrophin-releasing factor (CRF), is the major centrally regulated neuroendocrine system responsible for rapid and strong responses to stress. It acts through a cascade of brain and hormonal events that ultimately results in heightened release of glucocorticoids (Bale and Vale 2004; de Kloet et al. 2005; Harris et al. 1997; Hauger et al. 2006, 2009; Hinson et al. 2007; McEwen 2007; Sorrells and Sapolsky 2007). A key aspect of HPA axis function is the predictable diurnal pattern of cortisol release characterized by peak glucocorticoid levels in the hour after awakening and steady decline to a bedtime nadir (Linkowski et al. 1993; Stone et al. 2001). Homeostatic regulatory shifts in diurnal HPA activity appear to be essential for balancing stimulatory and inhibitory glucocorticoid actions (Dallman et al. 1995; Munck et al. 1984; Munck and Naray-Fejes-Toth 1992; Sapolsky 1996, 2000; Sapolsky et al. 1986, 2000). A challenge to cortisol research involves differentiating characteristics of typical diurnal variation of cortisol concentrations from individual variation in the diurnal cycle (Smyth et al. 1997) and from normal and abnormal activation in response to aversive stimuli (Levine 1993).

Our understanding of HPA axis responsiveness under normal and stressed conditions has evolved significantly over the past 30 years, as researchers continue to elucidate brain CRF systems and other neuroendocrine mechanisms governing HPA function. One important advance has been frequent, systematic measurement of salivary cortisol levels over long periods of time, a study method which does not alter HPA function compared to intravenous sampling methods. The complex regulation of brain CRF systems and HPA secretion during stress challenge and adaptation implicate different regulatory systems governing the homeostatic components and responsive (reactive) functions of the HPA axis (Bale and Vale 2004; Hauger et al. 2006, 2009). For the most part, evidence supports the view that dysregulation of brain CRF systems and HPA axis function increases with age, especially with sustained exposure to major stressors, and is associated with poorer health and cognitive outcomes (Dedovic et al. 2009; Hellhammer et al. 2009; Lupien et al. 2005, 2007; McEwen 2007; Seeman et al. 1997, 2001; Seeman and Robbins 1994).

As researchers continue to identify relationships between different aspects of cortisol regulation and aging, it is important to evaluate the sources of individual variation (Dedovic et al. 2009; Hellhammer et al. 2009; Kudielka et al. 2009). Twin studies provide a powerful approach to understanding genetic and environmental influences on individual trait variation, but only a few twin studies have examined the diurnal regulation of cortisol. In a reanalysis of five earlier twin studies, Bartels et al. (2003b) estimated a maximum heritability of .62 for non-experimentally aroused (i.e., basal) cortisol levels. The remaining variance was accounted for by environmental influences specific to each individual (unique environment). However, Bartels et al. noted that many of the studies ignored the timing of samples relative to diurnal cycles, had relatively small sample sizes (e.g., the studies averaged 39 twin pairs), and collected only a few samples often from only a single day. They also found wide variation in age (range 8–82) and data collection techniques which made comparisons difficult.

Here we focus on twin studies that address the role of genetic and environmental factors on diurnal regulation of cortisol concentrations. These studies all include multiple saliva samples collected at pre-determined time points across the day relevant to diurnal variation. Because there are so few twin studies of diurnal variation, we included studies of salivary, urinary, and serum cortisol, focusing on results for free cortisol.

In general, genetic regulation of cortisol secretion is most apparent in the morning. Among the four twin studies with awakening cortisol samples (Bartels et al. 2003a, b; Kupper et al. 2005; Linkowski et al. 1993; Wüst et al. 2000), heritability of awakening cortisol ranged from a low of .22 in a saliva samples of 12 year old children on a school day (Bartels et al. 2003a) to a high of .68 in college students undergoing 24-h plasma cortisol sampling (Linkowski et al. 1993). Awakening heritability was estimated at .33 in a large sample of community-dwelling twins in their late twenties/early thirties (Kupper et al. 2005). Consistent with Bartels et al. (2003a, b), unique environmental influences accounted for the remaining variance. In the 12-year-old children, the highest daytime heritabilities (.60) for salivary cortisol occurred 45 min after awakening just before the children in the study left for school (Bartels et al. 2003a). In the three twin studies reporting on specific afternoon or evening cortisol values, heritabilities ranged from about zero (Kupper et al. 2005) to approximately .70 in young adults (Federenko et al. 2004; Kirschbaum et al. 1992). Averaged morning heritabilities tend to be higher—accounting for as much as 48% of the variance in cortisol levels (Inglis et al. 1999; Meikle et al. 1988; Wüst et al. 2000)—than heritability estimates of cortisol using measures of mean output (24 h plasma or urine) across the entire day (Inglis et al. 1999; Linkowski et al. 1993).

Because the HPA axis is responsive to external and internal challenges, researchers have also examined the roles of genetic and environmental influences on changes in cortisol with regard to the diurnal cycle. For instance, are there individual differences in the amount of increase in cortisol during the first hour after awakening (cortisol awakening response; CAR) or in the decline from peak to nadir? What roles do genes and environment play in the daily waxing and waning of cortisol? Several twin studies examined changes in the HPA axis across the portions of the day (Bartels et al. 2003a, b; Inglis et al. 1999; Kupper et al. 2005; Linkowski et al. 1993; Meikle et al. 1988; Wüst et al. 2000). In most of these studies, heritability estimates for the CAR or other slope measures were minimal. However, in one study the heritability of the CAR slope in adults was estimated at .40 (Wüst et al. 2000). Given how few studies examine cortisol slope measures, more detailed examination of the extent to which genetic and environmental factors influence cortisol regulation across the day is warranted.

In summary, earlier research found some evidence for genetic influences on individual morning cortisol measures and mean morning levels of cortisol, and found strong unique environmental influences on cortisol concentrations throughout the day, especially after mid-morning (Bartels et al. 2003a, b; Kupper et al. 2005; Linkowski et al. 1993; Wüst et al. 2000). Significant genetic influences were found more consistently when studies were conducted in laboratory conditions whether or not they involved experimental manipulations (Federenko et al. 2004; Froehlich et al. 2000; Inglis et al. 1999; Kirschbaum et al. 1992; Linkowski et al. 1993; Meikle et al. 1988). Finally, given that there are so few twin studies of cortisol, the inconsistent findings concerning genetic and environmental influences might be due to widely varying study characteristics. Most studies did not report confidence intervals, so it is difficult to estimate the precision of the estimates. The studies differed by sample size, cortisol sampling techniques and contexts, use of plasma, urinary, and salivary cortisol, the timing and number of assessments within the diurnal cycle, how saliva samples were organized during the assay process (batch effects), and characteristics of the participants such as age, gender, and health status. The majority of these twin studies were underpowered to obtain reliable estimates of genetic and common environmental influences (Bartels et al. 2003a, b).

In recent years, evidence has accrued that collecting cortisol samples at multiple time points within and across days improves assessment of stable (trait-like) characteristics of the diurnal cycle (Hellhammer et al. 2007, 2009; Kudielka et al. 2009). In the present study, we systematically collected saliva samples from 783 male twins across three non-consecutive days—two typical days at home and one test day in the research laboratory. The twins were part of a larger longitudinal study of risk and preventive factors for cognitive aging. The goal of this study was to examine the roles of genetic and environmental influences on salivary cortisol regulation at multiple time points in the diurnal cycle and in different contexts (home and laboratory).

Methods

Research participants

The present study was part of the Vietnam Era Twin Study of Aging (VETSA 1: 2002–2008); the VETSA protocol has been described in detail elsewhere (Kremen et al. 2006). VETSA twins were recruited from the Vietnam Era Twin Registry which comprises male same-sex monozygotic (MZ) and dizygotic (DZ) twin pairs who served in the United States military at some time during the Vietnam era (1965–1975). The majority of men did not serve in combat or in Vietnam. There are no female twins in the twin registry because so few women were in the military during the Vietnam era (Eisen et al. 1987; Henderson et al. 1990). VETSA twin pairs were randomly selected from a pool of 3322 VET Registry twin pairs who had participated in a study of psychological health in 1992 (Tsuang et al. 1996). Eligibility criteria were that participants had to be age 51–59 at the time of recruitment and both members of a pair had to agree to participate (though they did not have to be tested at the same time). Twins traveled either to the University of California San Diego or Boston University laboratories for a day-long series of interviews and physical and cognitive assessments. In cases in which a twin could not travel (n = 33 individuals, 2.7%) research assistants conducted assessments at a facility close to the twin’s home.

Saliva samples were obtained from all VETSA participants starting in late February 2005 (N = 795); nine eligible twins declined participation in the cortisol data collection and saliva samples from three twins were lost or spilled (final N = 783). Following detailed instructions about the timing and conditions of saliva collection, participants completed two nonconsecutive days of saliva collection at home prior to the day of testing and then sent the saliva samples overnight mail to the University of California, Davis to be assayed under the direction of SM. Saliva samples collected on the in-laboratory day of testing were also shipped over-night to the same laboratory. All saliva collection materials used by the participants that could come into contact with saliva (e.g., vials, gum, straws) were tested in advance by SM to ensure they did not influence cortisol assay results.

IRB approval was obtained at all sites, and all participants provided signed informed consent. A combination of DNA testing, previously obtained questionnaire and blood group methods was used to determine zygosity (Eisen et al. 1989; Nichols and Bilbro 1966; Peeters et al. 1998). At the time of this manuscript, zygosity of two-thirds of the sample had been determined by analyses of 25 satellite markers based on blood samples. When DNA results were not yet available, zygosity was based on a combination of questionnaire and blood group data. Comparisons showed 95% agreement between DNA and questionnaire results.

Procedures

Saliva collection

We contacted participants 6 weeks prior to the day of testing to establish the two “typical” working days separated by 1 day (preferably Tuesday/Thursday to avoid sampling on the beginning and end of the work week) on which they would provide the at-home saliva samples. At-home sample collection took place two to 3 weeks before the laboratory day of testing in order to avoid the disruption of schedules that can be caused by travel. Participants were asked the time they usually woke up in the morning in order to individualize the kit information and to set times on the reminder watches. Cortisol kits were mailed to participants and participants were called the day prior to starting sampling to ensure that the reminder watch was turned on, instructions were understood, and the kit placed by their bed for the morning sample. Participants were reminded to provide the awakening sample while they were still in bed and to not consume caffeine between the awakening and the awake-plus-30 min sample. If participants were sick or experiencing unexpected stress, they were asked to call us to modify their schedule. The saliva kit included all supplies: labeled 4.5 ml Cryotube spit vials, Trident original sugarless gum, straws to facilitate drooling into each vial, tissues, instructions, a daily log, pen, a reminder watch, and a storage container with a track cap.

Participants provided saliva samples when they first woke up, 30 min after wake-up, and at 1000, 1500, and 2100 (or bedtime) hours (h) military time. At the specified time, participants selected the appropriately labeled vial and spat into it until the saliva reached a line drawn at 2.25 ml. If the twin found it difficult to provide enough saliva, the participant chewed original Trident sugarless gum to stimulate saliva and removed the gum prior to spitting. After spitting, participants were instructed to close the vial tightly, and place it in the storage bottle. Participants were asked to keep the samples refrigerated and were provided with an insulated bag to keep the supplies together. The reminder watch was programmed to notify participants at each scheduled time, however, the time protocol was carefully explained (verbally and in writing) to allow for typical variations in schedules (e.g., being in traffic or in a meeting when the alarm sounded). Reminder watch times were individualized so that participants with atypical wake/sleep schedules (e.g., participants who worked night shifts) provided samples at equivalent times (their own awakening time, awake-plus-30 min, awake-plus-4 h, awake-plus-9 h, and bedtime). Immediately following each saliva sample, participants completed a written log indicating their mood, food and drink intake, medications taken, alcohol use, and their activities during the previous hour; participants also rated the stressfulness and typicality of each day. Participants were asked “would you describe your day today as: very stressful, moderately stressful, not at all stressful?” Typicality was addressed through asking “Compared with a usual day, was your day: a typical day, better than usual or worse than usual?”

We also collected saliva samples on the laboratory day of testing. Participants arrived the day before testing started and received their saliva kit supplies when they arrived at the hotel. On the test day, twins provided samples as soon as they woke up, then half an hour after awakening, while they were at the hotel. The 1000 and 1500 h saliva samples were collected in the laboratory; the bedtime samples were provided back at the hotel. Two additional saliva samples were taken before and after lunch in the laboratory. In order to control for the possible variation in stressfulness of different cognitive tests, samples were collected between specific tests (close to 1000 and 1500 h) rather than at exact times. Test day protocols were standardized across sites. Participants completed log entries following each sample.

Participants arrived at the lab at approximately 0800 h and departed at 1630 h, with an hour for lunch and brief breaks during the day. The protocol included: phlebotomy (immediately after consenting and 1.5 h before first in-lab saliva sample), a medical history interview, anthropomorphic measures (height, weight, girth), a battery of standard cognitive tests (assessing domains of short term and long term verbal and visuospatial memory, working memory, concentration, attention, processing speed, overall cognitive ability, and executive functions), functional assessments (e.g., vision and hearing tests, walk test, grip strength, Rise from Chair, pulmonary function testing), and multiple blood pressure assessments (sitting, supine, ankle-arm index) (Kremen et al. 2006). The test protocol was organized into four counterbalanced orders; twin pairs always tested (in separate rooms) in the same test order. Cortisol concentrations did not differ on the basis of test order. In orders one and two, just prior to the 1000 h sample, participants completed a working memory task and a brief questionnaire. Before the 1500 h sample they took a different working memory test. In orders three and four, the 1000 h sample was preceded by the D-KEFS Trails (Delis et al. 2001) and a vision test; the 1500 h sample was preceded by a working memory test.

Cortisol assays

Prior to conducting the assays, samples were centrifuged at 3000 rpm for 20 min to separate the aqueous component from mucins and other suspended particles. Salivary concentrations of cortisol were estimated in duplicate using commercial radioimmunoassay kits (Siemens Medical Solutions Diagnostics, Los Angeles, CA). Assay procedures were modified to accommodate overall lower levels of cortisol in human saliva relative to plasma as follows: (1) standards were diluted to concentrations ranging from 2.76 to 345 nanomols per liter (nmol/l); (2) sample volume was increased to 200 μl, and (3) incubation times were extended to 3 h. Serial dilution of samples indicates that the modified assay displays a linearity of 0.98 and an assay sensitivity (least detectable dose) of 1.3854 nmol/l. Intra- and inter-assay coefficients of variation are 3.962 and 5.662%. Of the possible 13,311 possible saliva samples from 783 participants, 149 (1%) samples were missing due to participant lapses or technical problems.

All samples from a participant were analyzed in the same assay; one to three individuals were included in the same assay batch. Batch numbers were retained in order to adjust for possible batch-specific effects. Cortisol assays were performed without knowledge of the zygosity of the participant. If salivary cortisol concentrations exceeded 50 nmol/l, the value was set to missing. This cut-point corresponds with research suggesting that values above 50 nmol/l are most likely outliers (Hellhammer et al. 2009); this value also corresponds with cortisol concentrations three standard deviations above the average awakening mean. Scores were imputed for missing values only if the participant had no more than one missing value on a day (80% of data present). In order to impute missing data, we first calculated the full samples’ mean cortisol change between the time point with the missing value and the adjacent time point; for all time points except awakening, we used the time point prior to the missing value. We then added (or subtracted) the mean cortisol change for those two points from the individual participant’s non-missing time point to get the imputed value for the missing time point in question. For example, if a participant was missing a cortisol value for 1500 h, the full samples’ mean change cortisol from 1000 to 1500 h was calculated. This value was then subtracted from the participant’s 1000 h value to obtain the 1500 h value. Cortisol values were natural log transformed prior to data analysis in order to normalize the distributions. At-home cortisol concentrations were averaged at corresponding times on day one and day two to create a single value for analysis; averaging was supported by the high intercorrelations observed between daily measures of the same time point.

On the basis of previous research, we created within-day cortisol measures which included: the CAR, peak-to-nadir cortisol (the decay slope from the morning peak to bedtime), and mean daily cortisol output (average of five time points across the day). We created two CAR indicators: one (CAR mean) reflected the average of the awake and awake +30 min variables; the second CAR indicator (CAR slope) was created by subtracting the cortisol concentration at awakening from the cortisol concentration at awake +30 min.

Statistical methods

Using twin modeling, the proportion of variance or covariance due to three sources (genetic, common or shared environment, and unique or unshared environment) can be estimated. Because MZ twins share all of their genes whereas DZ twins, like other siblings, share on average 50% of their genes, the greater the difference in degree of similarity within MZ twin pairs compared to DZ pairs, the stronger the genetic influence on that characteristic. Additive genetic influences account for all genetic influences. The common environmental component represents life experiences shared between siblings and makes siblings more alike; the unique environmental component represents experiences that make siblings different (e.g., having a spouse die); unique environmental influences also include measurement error. Failure to account for extraneous sources of variation that act to artificially increase the similarity of members of a twin pair will result in biasing the variance component estimates of interest. The practical necessity of conducting multiple assays simultaneously on the same group of participants warrants an investigation of possible batch effects and how these may be estimated and controlled in the analysis of twin data. For instance, the initial observation of effects of shared environment (i.e., DZ correlations similar to those for MZ) is also consistent with significant batch effects for pairs of twins assigned to the same batch. The pattern of correlations between unrelated individuals in the same batch, and related individuals (twin pairs) in different batches allows us to estimate the random effects of differences between batches and to estimate the correlations for MZ and DZ twins while controlling for these extraneous sources of variability.

The partial nesting of pairs within batches and batches within pairs makes estimation of the components of variance by maximum-likelihood with widely-used packages for structural modeling relatively tedious, though not impossible. By contrast, we found the simultaneous estimation of the components of variances due to batch differences and differences between and within twin pairs relatively rapid and easy within a Bayesian framework using Markov Chain Monte Carlo (MCMC) methods in the freely available package WinBUGS (Spiegelhalter et al. 2004) for Bayesian inference using Gibbs sampling. Briefly we denote the assayed value of the jth twin of the ith pair by Y ijk , where the subscript k indicates the batch in which the value was assayed such that assays in the same batch have the same value of k.

We then let:

$$ Y_{ijk} = \mu + \tau_{ij} + \beta_{k} $$

The τ ij represent the random differences between twins, conceivably correlated between pairs, and β k the random differences between batches. For each type of twin (MZ and DZ) we assume that the twin effects τ ij , are bivariate normal with SD σ τ and intraclass correlation ρ MZ for MZ and ρ DZ for DZ pairs, respectively. The batch effects are assumed to be N[0, σ 2 β ]. When the MCMC algorithm converges, it yields successive samples from the posterior distribution of the parameters given the data that may be used to estimate summary statistics such as confidence intervals of model parameters to, theoretically, any desired degree of accuracy. We assumed a relative uninformative normal prior for μ. Following a suggestion of Spiegelhalter et al. (2004) we assumed broad uniform priors on σ τ and σ β . The twin correlations for MZ and DZ pairs were sampled initially from a uniform prior distribution over the range 0–0.9.

The program written for this application simulated random effects for each pair, π i , and batch, β k , using the appropriate between pair component of variance for each twin type to simulate the π i . The individual twin effects, τ ij we simulated to be N[π i , σ 2 τw ], where the intra-pair variance, σ 2 τw  = σ 2 τ (1 − ρ), ρ being the intraclass correlation for MZ or DZ twins as appropriate. We calculated the proportion of the phenotypic variance attributable to genetic influences (heritability: H). Ancillary summary statistics such as Holzinger’s H, H = 2(ρ MZ  − ρ DZ ), were computed from successive samples of ρ MZ for and ρ DZ together with their confidence intervals.

An estimate of the relative contribution of the shared environment, C, is given by: C = 2ρ DZ  − ρ MZ . Samples from the posterior distribution of C allow its confidence intervals to be estimated. Note that estimates of C may be negative if there are large non-additive genetic effects. Thus, this approach to the estimation of C avoids the biases inherent in constraining C > 0 in the more familiar ML components of variance approach. Estimation of C in this context can be used as a goodness of fit test of and indicate the presence of genetic dominance or shared environmental effects. A significant positive value indicates the presence of shared environmental effects while a significant negative value indicates the dominance effects; when no fit tests are significant in either direction then an AE model (one that only includes additive genetic and unshared environment influences) is supported as the best fitting model. Calculation of unique or non-shared environment (E) represents the degree to which twins are dissimilar and includes measurement error. It should be noted that negative heritability estimates are possible when reporting results from WinBUGS. This occurs when the DZ correlations are higher than the MZ correlations and is most likely explained by measurement error. Unlike Mx, WinBUGS does not include a floor constraint of zero. For the sake of clarity, non-significant heritability estimates are reported as “NS” (not significantly different from zero) in the tables. A copy of the WinBUGS code, together with illustrative data structure and initial values may be obtained from the second author.

Potential “confounding” variables

Past research reports a variety of health and lifestyle factors that influence cortisol concentrations, though recent reviews report mixed results for most of these factors (Chida and Steptoe 2009; Hellhammer et al. 2009; Kudielka et al. 2009) and there is little evidence from previous twin studies that they affect cortisol heritability estimates. Some such factors include: age, smoking, various health conditions and medications (e.g., heart disease, diabetes, asthma, use of medications such as corticosteroids and antidepressants, disruptions of sleep/wake cycles (e.g., shift work, jet lag on the day of testing), and sleep). In the VETSA study, since all the participants were part of a larger study of risk and preventive factors for cognitive aging—for which many of these health risk factors are hypothetically salient—no participants were excluded a priori from the cortisol data collection for health reasons.

Compliance with the protocol may also influence cortisol outcomes—in particular morning concentrations when cortisol is most volatile. We had two sources of information on compliance: self-reported time of sample from the cortisol daily logs and the track cap readings. We focus here on two indicators of compliance with the first two morning measures used to assess the CAR. First, did participants follow directions to leave a half hour between the awakening measure and the second measure? Inaccuracies of timing of these measures could mean that the “awake-plus-30 minute measure” is not assessing peak cortisol concentrations. Any participants who did not take the second sample within 20–40 min of the awakening sample were viewed as non-compliant. Second, did participants comply with instructions to provide the first saliva sample immediately on awakening? A negative CAR—indicated by awakening levels that are higher than awake-plus-30 min levels—could suggest that the awakening sample was not provided immediately on awakening (Adam et al. 2006; Chida and Steptoe 2009; Pruessner et al. 1999). We used negative CAR as a second indicator of non-compliance. The effect of each of the confounders on the cortisol measures was tested.

Results

Descriptive statistics

Participants in the VETSA cortisol study are predominantly non-Hispanic Caucasian (660; 86%), married (606; 78%), and working full-time (592; 76%). The average age at testing was 55.4 (2.48 SD); ages ranged from 51 to 60. Participants’ median household income was in the $70,000–79,000 range; average education was 13.8 (SD = 2.1) years. Twenty-nine participants (3.7%) worked night shifts or worked jobs with irregular wake times (e.g., truck drivers or variable shift workers). Shift workers provided samples according to their own sleep/wake cycle. The majority of participants reported diurnal cycles that typically involved awakening between 0400 and 0800 h (85% on the at home days; 98% on the laboratory day).

Health status

The health status of VETSA participants appears to be fairly consistent with that of men from the United States in the same age group in the 2004–2007 National Health Interview Survey by the Centers for Disease Control and Prevention (Schoeneborn and Heyman 2009). As shown in Table 1, 11.8% of VETSA participants reported they were in fair or poor health, as compared with 19.2% of men in this age group in the national survey. Rates of hypertension and cardiovascular disease were almost identical in the two samples, with other measures varying only slightly.

Table 1 Prevalence of health problems in VETSA men and comparisons with a national sample

We also examined the prevalence of other medical conditions in VETSA participants that are sometimes associated with cortisol dysregulation (Table 1). Overall, the typical VETSA participant’s blood pressure was in the pre-hypertension range (JNC 2004); average blood pressure (mean of four readings across the day) was 133.6 mmHg systolic and 83 mmHg diastolic. About one-third of participants (35%) took antihypertensive medications (Table 1). Approximately 9% of participants took medication for diabetes; 10% were currently taking antidepressants, and 3% were taking some form of corticosteroids. Less than one percent of the sample self-reported autoimmune diseases such as multiple chemical sensitivity, chronic fatigue, or fibromyalgia, and about 9% had asthma. There were no significant differences between MZ and DZ twins on demographic measures (e.g., marital status, income, education, occupation); demographic measures were not significantly associated with cortisol concentrations.

Stressfulness/typicality of days

Participants rated the test day as more stressful than the home days [mean 2.24 versus 2.37; paired t-test (713) 4.97 p < .0001, respectively]. However, given that the rating scale ranged from not at all stressful (3) to moderately stressful (2) to very stressful (1), these ratings suggest that participants found the test day only very mildly stressful. There were no significant associations between phenotypic stressfulness ratings and cortisol measures on any day. Within person ratings of the stressfulness of the two at home days correlated .53 (p < .0001) while the association between stressfulness of the at home days and the test day was minimal (r = .07, p < .05 and .17, p < .0001, day 1 and day 2, respectively). Cross twin correlations indicate that there was no relationship between the twins’ ratings of the at home days for either MZ or DZ twins (rs range from −.05 to .11). However, MZ twins experienced the day of testing as more similarly stressful (r = .46; p < .0001) than the DZ twins (r = .10, p < .19; Spearman rs). The majority of participants rated the two home days as “typical” (82% on day one and 76% on day two). Fewer than half (43%) of the participants rated the test day as “typical”; 38% described it as “better than usual” and 20% described it as “worse than usual.”

Salivary cortisol single time point concentrations

As can be seen in Table 2, mean cortisol concentrations at each time showed a typical diurnal pattern for the at-home and laboratory test day salivary cortisol concentrations involving: (1) higher awakening levels than bedtime levels; (2) peak levels at 30 min after awakening, and (3) a steady decline from the post-awakening peak across the day to a nadir at bedtime. While the standard deviations around time points were significant smaller on the test day than the at-home days, the overall variance of the cortisol samples was similar across at-home and test days. For at-home cortisol concentrations, the similar MZ and DZ within-pair correlations suggested absence or minimal influence of genetic factors. On the laboratory day, however, MZ correlations for the three morning cortisol concentrations were substantially higher than those of the DZ pairs. Univariate genetic analyses of laboratory day cortisol indicated significant heritability for cortisol concentrations across the morning times (awake, awake-plus-30 min, and 1000 h) of .56, .48, and .42, respectively. It should be noted in tables two and three that, at times, the heritability estimates exceed the MZ correlation. The estimates of heritability reported are derived from MCMC estimates of the MZ and DZ correlations. In conventional ML model fitting approaches, the heritability is constrained to lie between zero and the MZ twin correlation. Our MCMC estimates are not restricted to lie in this region, thus avoiding bias that may arise when this constraint is imposed. We note that in no case do our estimates of heritability significantly exceed the estimated MZ correlation. The point estimates are not significantly different as the confidence intervals of the MZ correlations and heritability estimates overlap to a considerable degree.

Table 2 Descriptive statistics, twin correlations and heritability estimates for salivary cortisol at specific time points during the at-home and laboratory test days

As can be seen in Table 2, the test for goodness of fit indicated that an AE model (a model that includes additive genetic and unique environmental influences) was the best fitting model for the data. The majority of the remaining variance in cortisol concentrations in both contexts was accounted for by environmental influences unique to each twin—most likely reflecting the influence of momentary individual-specific perturbations in cortisol. As in most adult twin studies of cortisol, there was virtually no influence of common environment on cortisol concentrations either at-home or in the laboratory.

Within-day salivary cortisol

On the laboratory day but not at home, MZ within-pair correlations for most composite measures were more than double the DZ within-pair correlations, with the exception of the CAR slope (Table 3). Univariate twin modeling showed that genetic influences accounted for a significant portion of the variance in the laboratory day mean CAR (.64) and the overall daily mean (.43). Neither the morning slope nor the decay measures were significantly heritable. The remaining variance was explained by unique environmental influences; common environment accounted for little variance in any measure. Additive genetic influences on the composite cortisol measures at home were low and not significant. As explained earlier, although at times the heritability estimates exceed the MZ correlation this is because the heritability estimates are derived from MCMC estimates of the MZ and DZ correlations. We note that in no case do our estimates of heritability significantly exceed the estimated MZ correlation. The point estimates are not significantly different due to considerable overlap in the confidence intervals of the MZ correlations and heritability estimates.

Table 3 Twin correlations and heritability estimates for composite indicators of change and mean salivary cortisol concentrations on the days at home and the laboratory test day

Influence of “confounders”

Extensive analysis of potential covariates indicated that smoking was the only variable systematically associated with mean cortisol values both at-home and on the day of test. Additionally, we uncovered significant jet-lag effects on the day of test values (Doane et al., in press). There was no change in heritabilities when smoking and jet lag were entered as covariates (results not shown). In addition, we systematically omitted subgroups of participants (one group at a time) with diabetes, asthma, autoimmune disorders, disrupted sleep (e.g., night shift or irregular shift work), participants who used antidepressants, corticosteroids, asthma medications, or antidepressants. In addition, we examined the effect of compliance by systematically omitting participants with negative CARs (Ns range from 115 at home to 111 on the test day), or with an awakening time that was not between 0400 and 0800 h (N = 116 at home and 19 on test day), or with non-compliant gaps between awakening and awake-plus-30 min samples (N = 58 at home, N = 62 on the test day). There was no association between age and cortisol, so age was not included as a covariate. Excluding subgroups of subjects with potential confounders did not change any of the main findings of the study.

Discussion

The data reported herein provide evidence for significant heritability of morning salivary cortisol regulation when our large sample of monozygotic and dizygotic twins was studied under laboratory conditions. We found that the heritability of cortisol tended to be higher on the laboratory day of testing rather than on the days at home, particularly in the mornings, and in composite within-day measures (e.g., CAR and overall daily means). Here we consider what might account for the increased heritability on the day of testing.

The more pronounced genetic effects observed for the laboratory test day may have resulted from an HPA axis response to the unfamiliar environment and tasks (cognitive testing). As adults living and working apart, it is likely that adult co-twins’ experiences on the home days are also more different from each other’s experiences. Because the HPA axis responds to multiple internal and external challenges, the more similar schedule and experiences on the laboratory day may have elicited similar HPA responses. The fact that neither MZ nor DZ twins’ ratings of each day’s stressfulness were significantly correlated on the days at home—but that MZ pairs’ ratings of stressfulness were significantly correlated on the test day—provides support for this hypothesis. In addition, none of the covariates that controlled for individual differences—in particular the compliance measures—made a difference in the heritability estimates.

These results suggest that morning cortisol concentrations are not simply reflections of the diurnal cycle but also are responsive to environmental events. From that perspective, the day of testing may be viewed as a mild challenge for the HPA compared to days at home. The data fit nicely with previous observations of a genetic impact for HPA axis responsivity for both the Trier Social Stress Test and the CAR (Kudielka et al. 2009; Wust et al. 2004, 2005). In other words, genetic effects become more visible once the HPA axis is stimulated. This is also consistent with previous twin studies in which significant morning heritabilities were found; these studies all included some element of activation or stressor—whether it was children getting ready for school (Bartels et al. 2003a), adults being monitored with 24 h ambulatory blood pressure and electrocardiogram equipment (Kupper et al. 2005), or other forms of activation (Bartels et al. 2003a, b). Types of stressful situations most likely to arouse cortisol involve some combination of being novel, unpredictable, uncontrollable, and involving a social evaluative threat (Dickerson and Kemeny 2002). It may be that the conditions in the laboratory—where the schedule of the day and activities were not under the control of the twin, and in which cognitive testing can arouse feelings of threat—created mildly challenging conditions under which the effect of genetic influences on cortisol arousal were more evident.

However, as suggested by the lower standard deviations for salivary collection times on the laboratory test day, our results might also be explained, in part, by the more accurate and synchronized timing of salivary cortisol secretion between twins on the test day compared to the home environment. Because the HPA axis responds both to diurnal rhythms and in response to stress, the more tightly orchestrated timing of saliva collections in conjunction with the parallel experiences of being tested may have reduced the “noise” on the day of testing allowing the genetic influences to be observed. However, “cleaning” the data by omitting subgroups with health problems or that were non-compliant in order to eliminate sources of variation did not result in higher cross-twin correlations at home. In addition, we note that while the standard deviations around the time of the cortisol samples are smaller on the day of test, there is no indication that the variance in the cortisol samples varies across the at-home and test days. This suggests that the variance in cortisol was not affected by the more precise sampling on the day of test, and that the laboratory test day samples did not necessarily contain less measurement error than the at-home days. Finally, the correlations of the DZ twins at all time points are the same magnitude at home and on the laboratory days, and conversely, the close synchronization of sample times and experiences on the day of testing did not increase the afternoon MZ twin correlations. Thus, although it is advisable to reduce measurement error as much as possible in studies of cortisol, addressing problems related to a variety of measurement errors in the present study (e.g., inclusion of various subgroups with health problems, variations in timing of samples) did not fully explain the higher heritability estimates for morning measures on the day of testing compared to at home.

Our results are consistent with findings from a number of twin studies which report significant unstimulated basal and morning heritability estimates in studies conducted in laboratory conditions whether or not the studies involved experimental manipulations (Federenko et al. 2004; Froehlich et al. 2000; Inglis et al. 1999; Kirschbaum et al. 1992; Linkowski et al. 1993; Meikle et al. 1988). For the most part these other studies were not specifically examining genetic and environmental influences on the diurnal cycle. Those studies that were attentive to the effect of time of day suggest complex relationships between stressors, individual differences, and cortisol responsivity (Federenko et al. 2004; Froehlich et al. 2000; Inglis et al. 1999; Kirschbaum et al. 1992; Ouellet-Morin et al. 2008). Ouellet-Morin et al. (2009), for instance, found low heritability (.32) for salivary cortisol measured at-home at-awakening in 6-month old twins. In the laboratory, however, there was a gene by environment interaction in which infants’ salivary mid-morning cortisol levels showed strong genetic (.69) influences if those infants came from backgrounds of high familial adversity. For infants from families low in adversity, laboratory-based morning cortisol levels were predominantly accounted for by unique environmental influences. A study of toddler twins, though, found high levels of salivary cortisol in response to laboratory stressors (Ouellet-Morin et al. 2008). For the high familial adversity toddlers, cortisol levels were accounted for by environmental—not genetic—factors. Genetic factors accounted for 51% of the variance in cortisol when the toddlers came from low adversity backgrounds (Ouellet-Morin et al. 2008). Although the results from the two Ouellet-Morin papers seem inconsistent, they may reflect developmental trends, as well as gene by environment interactions in determinants of cortisol reactivity. As another twist to the study of the effect of laboratory conditions on heritability, Federenko et al. (2004) found that adult participants adapted to laboratory conditions over time; on the first visit, twins’ afternoon cortisol heritability was minimal (.08) but by the third visit heritability increased dramatically to 1.00. As a group, these studies highlight the necessity of attending to developmental and contextual elements under which cortisol regulation is assessed (Kudielka et al. 2009). Exposure to stress early in life or to traumatic, unpredictable stress at any age has been shown to permanently increase an individual’s responsiveness to further stress and reduce the ability to cope with aversive events (Lupien et al. 2009). These age-dependent traumas may interact with genetic factors to influence an individual’s sensitivity to stress.

With regard to the influence of genetic and environmental factors across the day, this study—as have other twin studies with multiple samples across the day—found stronger genetic influences in the morning and predominantly non-shared environmental influences later in the day (Bartels et al. 2003a, b; Kupper et al. 2005). This suggests that some aspects of cortisol secretion may be under greater genetic control than others. Higher morning values may reflect the association between cortisol diurnal patterns, circadian clock genes, and adrenocorticotropic hormone secretion (Linkowski et al. 1993; Stone et al. 2001). Although substantial research has been conducted on the types of stressors that influence cortisol concentrations and circadian rhythms (Hellhammer et al. 2009; Kudielka et al. 2009), there has been little research on the role of genetic and environmental influences on contextual and developmental conditions influencing cortisol regulation. Evidence for varying genetic and environmental control of cortisol responsivity across the day and in different contexts suggests that interventions may need to be tailored not only to individual differences in genetic and environmental vulnerability to stress (e.g., Caspi et al. 2003, Ouellet-Morin 2009, #1680; Ouellet-Morin et al. 2009) but also vulnerability to more or less adaptive diurnal patterns.

Strengths and limitations

Although middle-aged men, such as those who comprise the VETSA sample, are understudied in the salivary cortisol literature, it is still important to see if our results generalize to women. One might also consider it a limitation that VETSA participants were not screened for exclusion criteria other than age. In some cortisol studies, numerous health and lifestyle factors have typically been used as exclusion criteria because they were viewed as confounds. That means that what is mostly known about HPA function is about a highly-screened segment of the population—what has sometimes been referred to as “super-normal” (Kendler 1990). Because the cortisol study was part of a larger study of risk and preventive factors for cognitive and brain aging, we did not exclude participants a priori for confounds sometimes associated with cortisol. As noted in the results section, the VETSA sample is similar to American men in terms of overall health characteristics. In this context, illnesses or injuries are regarded as additional factors contributing to the total genetic and environmental variances that influence the cortisol concentrations. When we examined the effect of excluding various subgroups on the heritability estimates, the results did not change our main findings. However, this approach does not mean that the role of specific factors in contributing to the heritability of cortisol is unimportant. Examination of those relationships requires separate analyses and careful conceptualizations of multiple relevant physiological and psychological processes that are beyond the scope of this article.

The VETSA cortisol study also has multiple strengths. Salivary cortisol data were collected in a large community-dwelling sample across multiple days, under normal and putatively stressful or challenging conditions, and at multiple time points across each day. Previous genetically-informed studies tended to use younger samples selected for good health, and many did not adequately reflect diurnal cycles in the timing of the cortisol measures. In addition, the narrow age range will be useful in planned longitudinal analyses as these individuals transition to later stages of life.