Introduction

Morphological integration is a phenomenon with complex and multivariate mechanisms that manifests as the covariation between traits and structures (Olson and Miller 1958). Integration plays an important role in both the direction and rate of phenotypic evolution through its impact on the generation of variation (Wagner and Altenberg 1996; Chernoff and Magwene 1999), and various attempts have been made to model variation in its expression (i.e., structure and magnitude) by both intrinsic and extrinsic factors (Atchley and Hall 1991; Hallgrímsson et al. 2002; Young and Hallgrímsson 2005; Zelditch et al. 2006; Hallgrímsson et al. 2007). Intrinsic factors include genetic variation in those developmental processes active within modules, such as the number of progenitor cells, their size, and the rate at which they proliferate, whereas extrinsic factors includes genetic variation in hormones, muscle-bone interactions, neuromuscular control, activity levels, diet, or even intrinsic factors of neighboring modules, all of which act indirectly or epigenetically (Atchley and Hall 1991; Zelditch et al. 2004, 2006). Integration is therefore thought to evolve via selection on genetic variation of both intrinsic and epigenetic processes. However, it is an open question as to the relative contribution of either intrinsic or extrinsic factors to observed differences in integration, and thus which processes and genes are affected by selection. To fully understand differences in the structure and magnitude of integration requires an individual accounting of the potential impact of multiple processes that may act during ontogeny on adult variance-covariance (VCV) structure.

Several recent studies have focused on interspecific comparisons of the adult limb skeleton, and in particular the role of genetic factors intrinsic to hypothesized modules (e.g., Hallgrímsson et al. 2002; Young and Hallgrímsson 2005; Reno et al. 2008). They found strong evidence for the role of shared genetic factors (e.g., Hox patterning genes) on covariation structure within limbs, among homologous elements (e.g., radius and tibia or humerus and femur), and overall. However, it remains unknown what impact epigenetic factors have on limb integration. This is in part due to the fact that contributions of individual factors to both the overall structure and variation in the magnitude of limb integration are difficult to disentangle without ontogenetic data or control over factors that may introduce their own confounding variance (e.g., phylogenetic history, genetic background). This situation is particularly problematic because, without this sort of additional information, results from comparative analyses of natural populations can be ambiguous and difficult to interpret except at the broadest level (Garland et al. 2005).

Here, we focus on one epigenetic factor that possibly impacts variation in mammalian limb integration: post-weaning level of locomotor activity. We hypothesize that selective breeding for high locomotor activity, access to the opportunity for sustained exercise regardless of selection history, and interaction between these factors leads to lower variance (canalization) and higher covariance among limb elements (integration). The rationale for this hypothesis is that patterns of muscle contraction relate to strains on bone (Herring and Teng 2000), and that this muscle-bone interaction can cause differential growth in areas closest to peak strains (Carter et al. 1998; Herring et al. 2002; Nowlan and Prendergast 2005). In addition, it is known that muscle-bone interactions during development are critical for producing normal morphology (Nowlan et al. 2007), and there are pleiotropic relationships between muscle and bone (Karasik and Kiel 2008) suggesting epigenetic effects between them. The increases in neuromuscular control or coordination of movement associated with higher activity levels may normalize strains and bone growth to directions that are preferred or optimal, thus leading to lower variance within limb elements (i.e., greater canalization) and higher covariation among elements (i.e., higher integration) (Zelditch et al. 2004, 2006).

To test these alternatives, we took an experimental evolution approach (Garland and Rose 2009). We utilized the skeletons of laboratory house mice from lines that have undergone long-term selective breeding for high levels of voluntary wheel running, as well as their non-selected control lines (Swallow et al. 1998; Kelly et al. 2006; Middleton et al. 2008a, b). After 21 generations of selection, male mice from the four replicate selected lines ran 2–3 times as many revolutions/day and were smaller (both mass and body length) as compared with the four control lines. Interestingly, there were no statistically significant differences in mean hindlimb length measurements, but males from the selected lines had significantly thicker femora and tibiafibulae, larger femoral condyles, and heavier feet (Kelly et al. 2006). In addition, a study of both sexes from generation 12 also found larger femoral condyles, in addition to reduced directional asymmetry of hindlimb bones in the selected lines (Garland and Freeman 2005). These previous results suggest that selection acts on a number of traits, including variation in behavioral and/or physiological characteristics extrinsic to the actual bones (such as hormone levels: Vaanholt et al. 2007; Malisch 2008), and so serves as a useful test of the impact of epigenetic factors on the structure and magnitude of limb-length integration. Using this experimental model, we assess the impact of both selection for high activity, access to the opportunity for sustained exercise on a wheel, and the interaction of these effects on the structure and magnitude of limb integration within a single species. With this model there is greater statistical control than is usually possible with natural populations—an important point because testing these competing hypotheses is difficult without control over the numerous intrinsic and extrinsic factors that could potentially introduce variance and covariance into quantitative traits.

Methods

Data

The sample used here is the same as that described in Kelly et al. (2006), although individuals with the mini-muscle phenotype (Hartmann et al. 2008; Middleton et al. 2008a, b) were excluded to limit the effect of sample heterogeneity on measures of variance. The original progenitors of this experimental sample were outbred and genetically variable house mice (Mus domesticus) of the Hsd:ICR strain (Harlan-Sprague-Dawley, Indianapolis, Indiana, USA). Mice were randomly mated for two generations, paired, and assigned to either “Selected” lines (S) (i.e., the highest-running male and female were chosen from each family as breeders) or “Control” lines (C) (i.e., a male and a female were randomly chosen from each family). Within all lines, the chosen breeders were randomly paired, except that sibling matings were disallowed [full details of the selection experiment are provided in Swallow et al. (1998)].

For the present sample of mice, at 21 days of age, two male pups from each of five families within each of the eight lines were weaned. At 25–28 days of age, these mice were housed individually in standard cages, half of which were attached to a running wheel (1.12 m circumference), yielding four groups in total: (1) mice from Control lines housed without wheels (Sedentary [CS], N = 20); (2) mice from Control lines housed with wheels (Active [CA], N = 20); (3) mice from Selected lines housed without wheels ([SS] N = 16); and (4) mice from Selected lines housed with wheels ([SA] N = 16). After an additional 8–9 weeks under these conditions, mice were sacrificed via CO2 asphyxiation and skeletonized with a colony of dermestid beetles maintained by the University of Wisconsin Zoological Museum (Kelly et al. 2006).

Skeletonized and disarticulated limb elements (femur, humerus, tibia, radius, ulna, metacarpal III, and metatarsal III) were placed on a flatbed scanner and imaged in 24-bit color at 1,200 dots per inch (TIFF format without file compression). This method has similar characteristics to camera-based two-dimensional imaging systems, but with the added benefit of ease of setup, portability, and low equipment cost. Although parallax is an issue with all optical systems, this error is most prevalent in flatbed scanners when objects extend a large distance from the image sensor (Schubert 2000). Our analyses indicate parallax errors are negligible here due to the very small size of the bones and limited depth (data not shown). To further minimize the possibility of scanner error, we imaged limb elements on two separate occasions after moving and rotating them an arbitrary amount. Two-dimensional landmark data were collected twice from each of the two scanned images (Fig. 1: Landmarks). Landmarks represented the endpoints of maximum length measurements for individual elements as calculated from SigmaScan Pro (Systat Corporation, Richmond, California). Limb length data were subsequently averaged across scans and trials.

Fig. 1
figure 1

Illustration showing a typical example of scan quality used in the data collection process. Length (dashed line) was calculated as the maximal proximo-distal distance from the landmarks (red circles) shown on the schematic outlines (size magnified). (Color figure online)

Statistical Analyses

Limb lengths were modeled using the GLM (General Linear Model) function in R (R Development Core Team 2008) and a cross-nested, two-way ANCOVA with activity and linetype as grouping factors and body mass as a covariate (note that in this analysis we ignored potential variation among the replicate selected and control lines because of the relatively small sample sizes involved). We tested for significant effects of linetype, activity, body mass, and interactions on limb lengths, and residuals were used as the input data for the tests of differences in variance and integration among groups, as outlined below.

To test for differences in canalization, Levene’s test was used to compare variance in limb length residuals. This test differs from standard measures of disparity in that absolute values of deviations from the within-group mean measure are compared rather than raw data (Van Valen 2005). A standard ANOVA was subsequently used to compare absolute values among groups. The prediction was that selective breeding for high activity and access to a running wheel would be negatively associated with variance.

Based on the previous discussion, we predicted that groups would have a similar correlation/covariation structure but differ in the strength of integration due to differences in activity level. Morphological integration was assessed as effects of linetype, activity, and linetype/activity interaction by comparing correlations and covariance of limb-length residuals (Wagner and Altenberg 1996; Chernoff and Magwene 1999).

We measured repeatability of individual correlation and VCV matrices by resampling the original dataset with replacement (10,0000 replicates), calculating a new correlation or VCV matrix, and comparing this to the observed matrix using either a matrix correlation or the random skewers method, respectively (Marroig and Cheverud 2001). Repeatability was estimated as the mean of the resampled matrix correlations or the correlation in response vectors. Similarity in overall correlation structure among groups was assessed by matrix correlation (r m ) and similarity in VCV structure was assessed via random skewers (r rv ). These values were adjusted by dividing the observed r m or r rv by an estimate of the maximum correlation (r max  = (t a * t b )0.5 where t a and t b are the repeatabilities of the matrices being compared) (Marroig and Cheverud 2001). To further examine the structural differences among group correlation matrices, average Fisher-transformed correlations were calculated for functional groupings of limb elements and compared between groups (Young and Hallgrímsson 2005). Standard error of correlations and Fisher-transformed correlations was calculated by resampling the original dataset (10,000 replicates).

The strength of morphological integration was calculated from the variance of the eigenvalues (σ 2λ or EV) of both correlation and VCV matrices (Wagner 1984, 1990; Van Valen 2005). Eigenvalue variance measures whether the total variance (the sum of individual trait variances or the trace of the correlation matrix) can be explained by a small number of factors (ellipsoid or high EV), or whether it is more evenly distributed across principal components with similar explanatory power (spherical or low EV). In the analysis of correlation structure, there are seven traits each with a variance of one, so the maximal EV is seven (perfect integration), and the minimal variance is zero (no integration). Higher EV would be considered more integrated and lower EV would be less integrated. Comparison of EV from VCV matrices requires adjustment due to unequal total variance among groups. EV was normalized for comparison by dividing the observed EV by the total variance of the group (the sum of individual trait variances or the trace of the VCV) (Young 2006). In both cases the standard error of the EV estimates was calculated by resampling with replacement from the residual data (N = 10,000 replicates), and significance was calculated as the number of times the observed EV exceeded the resampled distribution. Resampling and EV calculations were performed in Poptools (Hood 2008).

Results

As in Kelly et al. (2006), there was no significant effect of linetype, activity or the linetype/activity interaction on hindlimb or forelimb lengths, whereas body mass was a significant positive predictor of lengths of both limb (Table 1: ANCOVA results). There was a weakly significant relationship between linetype and autopod length: mice from the selected lines had longer metacarpals and metatarsal. Residuals from these generalized linear models were used as input data for the analyses of covariation outlined below.

Table 1 P-values (bold indicate significant, * < 0.05, ** < 0.0001) from the two-way cross-nested analysis of covariance

There was no significant difference in canalization of limb lengths as determined by comparing variance between linetypes, activity levels, or the interaction between linetype and activity interaction (Table 2: Variance results). The exception to this pattern was in the radius/ulna and tibiafibula of the Selected relative to the Control lines, and SA relative to other [linetype/activity] groups. In these comparisons the Selected and SA stylopod elements (tibiafibula and radius-ulna) were significantly less variable than other groups.

Table 2 Limb-length variances (for residuals) and P-values (from Levene’s tests) for associated comparisons (CA: control/active, CS: control/sedentary, SS: selected/sedentary, SA: selected/active)

Individual correlation and VCV matrices are shown in Tables 311 (Tables 311: Matrices). Correlation matrices varied in repeatability from 0.825–0.962 while VCV matrices varied from 0.936–0.987 (Table 12: Correlation and Covariation matrix repeatability and similarity). Correlation matrices were significantly correlated among linetype, activity and linetype/activity interaction groups (raw average r m  = 0.668 (P = 0.001–0.053) for linetype/activity interaction, r m  = 0.865 (P = 0.002) between Control and Selected, and raw r m  = 0.753 (P = 0.001) between Sedentary and Active groups). Significance values for matrix correlations are measured by a Mantel’s test, and indicate when matrices do not significantly differ in their overall structure. This test supports an interpretation that there is no significant difference in limb length correlation structure among groups. Random skewers analysis of the VCV matrices indicates even greater support for structural similarity, with significant correlations for all values (Table 12). The general exception is the SA group, with reported r m values as low as 0.507, and r rv values as low as 0.806.

Table 3 Trait correlations (below diagonal) and covariances (above diagonal) computed from residuals on body size for the entire dataset
Table 4 Trait correlations (below diagonal) and covariances (above diagonal) computed from residuals on body size for the control-active dataset
Table 5 Trait correlations (below diagonal) and covariances (above diagonal) computed from residuals on body size for the control-sedentary dataset
Table 6 : Trait correlations (below diagonal) and covariances (above diagonal) computed from residuals on body size for the selected-active dataset
Table 7 Trait correlations (below diagonal) and covariances (above diagonal) computed from residuals on body size for the selected-sedentary dataset
Table 8 Trait correlations (below diagonal) and covariances (above diagonal) computed from residuals on body size for the control dataset
Table 9 Trait correlations (below diagonal) and covariances (above diagonal) computed from residuals on body size for the selected dataset
Table 10 Trait correlations (below diagonal) and covariances (above diagonal) computed from residuals on body size for the active dataset
Table 11 Trait correlations (below diagonal) and covariances (above diagonal) computed from residuals on body size for the sedentary dataset
Table 12 Matrix correlation and covariance matrix similarity between pairwise groups, repeatabilities for each group (t1 and t2), p-values, and observed and adjusted matrix correlations (r m and r adj ) and covariance matrix similarity (r rv and r rvadj )

Closer examination of the correlation matrices suggests that similarity in structure is primarily driven by high correlations between zeugopod and stylopod elements. The autopod was typically less correlated with other limb elements, with the strongest correlations between autopod elements (metacarpal-metatarsal) and autopod to zeugopod. This overall pattern is similar to that found in previous analyses of mammalian and murine limb covariation structure (Hallgrímsson et al. 2002; Young and Hallgrímsson 2005), and to that found in birds (although excluding autopod) (Magwene 2001), in which there is modularity both within limbs and among homologous elements.

Examination of individual correlations or averages of sets of correlations reveals some structural differences among group matrices (Table 13: Fisher’s-z correlation averages). For example, the SA and SS groups are the most weakly correlated matrices. These groups primarily differ in the strength of correlation between stylopod and hindlimb correlations (SA is lower) and to some degree autopods (SA is higher). As previously noted, the SA group also exhibits significantly lower stylopod variance, suggesting that differences in correlation structure between SA and SS groups are partially attributable to this factor. In addition, Active groups (Active, CA and SA) primarily differ in having stronger average autopod correlations relative to Sedentary groups (Sedentary, CS and SS), Control groups (Control CA, CS) exhibit higher average autopod correlations relative to Selected groups (Selected, SS) (although SA also exhibits a similar average autopod correlation to Control groups), and Selected and Sedentary groups exhibit slightly higher average hindlimb correlations.

Table 13 Matrix correlation and covariance matrix similarity between pairwise groups, repeatabilities for each group (t1 and t2), p-values, and observed and adjusted matrix correlations (rm and radj) and covariance matrix similarity (rrv and rrvadj)

Comparison of resampled distributions indicates that there are no significant differences in the magnitude of EV between any of the subgroups (Table 14: EV results, Figs. 2 and 3: Integration results). In fact, the direction of differences in integration is opposite to those predicted. In correlation matrix EV, both selected (EVselected = 2.45) and active (EVactive = 2.48) groups have lower EV scores than control (EVcontrol = 2.84) or sedentary (EVsedentary = 2.91) groups, although the differences are not statistically significant. CS (EVcs = 2.93), CA (EVca = 2.99), and SS (EVss = 3.15) groups are more integrated compared to the SA group (EVsa = 2.08), although again the difference in observed EV is not significant. This general pattern is repeated in the VCV matrix-based EV results.

Table 14 Observed eigenvalue variance
Fig. 2
figure 2

Distributions of resampled eigenvalue variance (10,000 replicates) for linetype (white = all, blue = control, yellow = selected), activity level (red = sedentary, green = active), and linetype/activity interaction (CA: control/active, CS: control/sedentary, SS: selected/sedentary, SA: selected/active) as computed from group correlation matrices. Error bars indicate the 95% confidence limits, while horizontal bars are the mean of the resampled EV. Observed mean is shown. (Color figure online)

Fig. 3
figure 3

Distributions of resampled eigenvalue variance (10,000 replicates) for linetype (white = all, blue = control, yellow = selected), activity level (red = sedentary, green = active), and linetype/activity interaction (CA: control/active, CS: control/sedentary, SS: selected/sedentary, SA: selected/active) as computed from group variance/covariance matrices. Error bars indicate the 95% confidence limits, while horizontal bars are the mean of the resampled EV. Observed mean is shown. (Color figure online)

Discussion

The goal of this study was to test the effects of selective breeding for high locomotor activity, access to the opportunity for sustained exercise (a running wheel), and the interaction of these factors on both variance and the structure and magnitude of integration in the limbs of mice. It was previously suggested that coordinated neuromuscular activity might function to remove variance, thereby canalizing phenotypes and increasing integration (Zelditch et al. 2006). If this were the case, then we hypothesized that in a sample of mice bred for 21 generations for high locomotor activity, and given access to an exercise wheel, we would find more canalized and integrated limb phenotypes when compared to sedentary control mice. Contrary to our predictions, there was only weak evidence for significant reductions in variance in the stylopod (i.e., radius, ulna and tibiafibula) of the Selected and SA groups, and correlation and covariation structure is very similar among all groups. In addition, there was no significant difference in the magnitude of limb integration associated with activity level, selection for activity, or interactions between them. In fact, a trend toward less integrated phenotypes was associated with lower observed variances in the SA group.

We predicted that there should be greater canalization of limb elements associated with selection for activity and access to a running wheel. The majority of comparisons of population variances did not support this hypothesis. However, there is a significant or near significant decrease in the SA group stylopod variance compared to all other groups. However, the interpretation of this difference in the absence of alterations to variance in other limb elements is unclear. One possibility is that the epigenetic effect of activity level is confined to the stylopod, and that increase in activity has helped canalize lengths of these fore- and hindlimb elements. But in the absence of similar changes across other limb elements, the functional implication of this difference is unknown. The alternative is that selection may have had the unintended consequence of reducing sample heterogeneity in the stylopod. Further investigation of this question is warranted with larger sample sizes.

Although correlation/covariation structure did not significantly change as we had predicted, it is important to note that any interpretation of similarity in correlation/covariation structure based on a Mantel’s test or random skewers alone does not necessarily imply that matrices are identical or that they have not changed under these experimental conditions. In fact, the magnitudes of the correlations reported here are in some cases lower than those observed among species (Ackermann 2005; Marroig and Cheverud 2001; Ackermann and Cheverud 2000), and one correlation is mildly insignificant (SA, P = 0.053). Although these previous studies have generally concluded that significant correlations implied unchanged correlation structure or integration, the magnitude of differences in structure reported here might actually be more pronounced than a Mantel’s test would suggest, and thus a closer examination of correlation structure is warranted.

Average correlations across subsets of these matrices indicate that the general pattern of correlations and covariance is both quantitatively and qualitatively similar among groups, whereas the actual magnitudes of correlations are more variable. The primary exception appears to be reflected in the correlation matrices of Selected and SA groups, both of which have low correlations for zeugopod elements, and Selected and Sedentary groups, which have lower average autopod correlations. Interestingly, these elements also exhibit significantly significantly lower variances or higher means, which suggests that the magnitude of the correlations, and thus of matrix similarity and integration, are in part driven by the effect that reductions in zeugopod ranges have on these estimates.

A more telling statistic is that the SA group has the lowest repeatabilities, indicating some degree of estimation error for the correlation/covariance structure of this group. This interpretation is not altogether surprising given the small sample sizes used here and the potential effect of the selection regime on reducing population heterogeneity. As variance is a necessary prerequisite for covariance, and in the SA case the VCV measures are low or unstable, lower integration and a mildly altered correlation structure are a predictable outcome. However, given the overall similarities, small effects, and the potential for estimation error, we lean toward the conservative interpretation that there is no meaningful change in correlation structure in the SA group relative to the others. This conclusion is further supported by the strong correlations in response vectors for all VCV matrices. The alternative interpretation, that reduced variance in these elements and consequent changes in measured covariance reflect a real biological effect due to either selection or activity is plausible, but not convincing, given the other evidence. Only a larger sample size under these experimental conditions can adequately discriminate these alternatives.

The effects measured here when comparing Active and Sedentary groups were active over ontogenetic time (i.e., they occurred from weaning until adulthood), but the measurements only reflect an average of these effects over time, and so any conclusion about their impact on earlier timepoints is speculative. That said, one could argue that the similarity among groups in correlation/covariance structure at the time of collection implies constancy at earlier postnatal timepoints. The alternative scenario is that correlation/covariation structure fluctuates over ontogeny, but at a later timepoint a similar covariance structure is converged upon despite the effect of different selection regimes and activity patterns. Although we prefer the former scenario from a pure parsimony standpoint, the latter cannot be ruled out with this dataset. Additionally, a number of studies have documented fluctuations in integration over ontogeny (Cheverud et al. 1983; Cheverud and Leamy 1985; Zelditch 1988; Zelditch and Carmichael 1989; Cane 1993; Ackermann 2005; Ivanovic et al. 2005), so this question remains open. For example, if these epigenetic factors do not have an effect on integration, then other interactions that do not vary in these groups may have driven the convergence in integrative pattern. In either case, the role of activity level as an epigenetic factor in establishing this integrative pattern appears to be minimal.

That said, even though the effect of increased activity and selective breeding on canalization and integration is negligible in the present analysis, this result does not necessarily imply that postnatal activity level does not play a role in structuring trait variation, correlation or covariance. Although epigenetic interactions associated with locomotor activity level might not radically alter variance or covariance, they may play a role in maintaining established patterns. For example, one might hypothesize that there is a baseline level of postnatal activity that is necessary to maintain the “normal” patterns of variance and covariance found in control mice, but increased levels of activity do not further contribute to these measures. Means and variances may also be unaffected due to a low magnitude of effect on the bones, e.g., if the limb bones of mice were relatively overdesigned for loads and thus resistant to the effects of muscular interactions. If this were the case, then investigations of earlier timepoints and/or use of models in which neuromuscular coordination is impaired or locomotor activity is restricted (e.g., via hindlimb suspension: Hauschka et al. 1988; Park and Schultz 1993) would be necessary to determine to what degree activity and associated muscular movement is necessary and/or contributes to the overall signal of variance in the limbs. For example, in mice with defects to neuromuscular control or in muscle morphology, is canalization reduced (i.e., variance increases) and does integration become weaker (i.e., reduced covariation) or break down (i.e., changes in correlation structure)?

In conclusion, we did not find strong evidence for changes in canalization or integration associated with postnatal activity level, either through access to exercise, selection for high locomotor activity, or a combination of the two. With some exceptions, these results generally support a model of variability in which adult variance and covariance structure are either unaffected by or maintained by these particular postnatal epigenetic interactions. As such, epigenetic interactions that are predicted to increase with postnatal activity level may not play a formative role in terms of adult morphological canalization or integration, but rather may contribute to maintaining morphological patterns generated earlier in development or by other integration-contributing factors. Further analyses explicitly focused on the ontogeny of epigenetic effects impacting integration are needed to test these hypotheses.