1 Introduction

The Atlantic subpolar gyre (SPG) has received much attention in recent years because of its role in decadal climate variability in the North Atlantic region. Observations suggest that its dynamics modulate the salinity of the inflow into the Nordic Seas (Hátún et al. 2005) and that this signal can be traced along the Norwegian coast to the Arctic Ocean (Holliday et al. 2008). Variability in the SPG is also associated with a reorganization of the current system in the North Atlantic (Häkkinen and Rhines 2009). Numerical modeling suggested that the SPG has considerable impact on Arctic sea ice (Yoshimori et al. 2010; Renold et al. 2010) and on the Atlantic meridional overturning circulation (AMOC) (Delworth et al. 1993; Lohmann et al. 2009). Through this latter relation, the SPG is associated with the Atlantic Multidecadal Oscillation, arguably the dominant mode of decadal variability in the Northern Hemisphere (Schlesinger and Ramankutty 1994; Knight et al. 2006; Mann et al. 2009). Thus, a thorough understanding of the SPG dynamics and its representation in coupled climate models is essential for a better simulation of decadal variability and therefore a requirement to improve decadal climate predictions.

Despite its importance, the dynamics of the SPG in coupled climate models are poorly understood and have not been investigated systematically. A number of different physical mechanisms and feedbacks has been suggested from various studies, ranging from decades to centuries (Yoshimori et al. 2010). However, their respective importance remains unclear.

A strong influence on the SPG strength and variability from surface wind stress is suggested by both observational and model studies (Curry et al. 1998; Böning et al. 2006; Häkkinen et al. 2011). However, a number of studies emphasize the importance of the density structure on the gyre transport (Mellor et al. 1982; Greatbatch et al. 1991; Myers et al. 1996; Penduff et al. 2000; Eden and Willebrand 2001; Montoya et al. 2011). Eden and Jung (2001) reconcile this conflict with the conclusion that interdecadal ocean variability is primarily driven by buoyancy forcing, while the effect of variations in wind stress is limited to shorter time scales.

In this study, we investigate the importance of baroclinicity for the SPG, and in particular the impact of a recently proposed positive feedback mechanism (Levermann and Born 2007; Born and Mignot 2011). A stronger SPG transports more saline subtropical water into the subpolar North Atlantic, mostly by advection. This causes stronger deep convection in the center of the SPG, more heat loss and therefore denser waters throughout the water column. The circulation of relatively light waters around this dense core intensifies and so does the strength of the SPG. This concept has been tested in model simulations (Born et al. 2010a, b, 2011; Born and Levermann 2010) and successfully explains paleo observations (Thornalley et al. 2009). In the absence of atmospheric variability, the advective-convective feedback mechanism is strong enough to cause a bistability of the SPG (Levermann and Born 2007). In the presence of a variable atmosphere, the threshold between the two stable modes is frequently crossed and the feedback mechanism increases the amplitude of variability (Born and Mignot 2011; Mengel et al. 2012).

Since the positive feedback mechanism exclusively depends on large-scale processes that are common to all climate models, it appears reasonable to assume that the feedback is ubiquitous as well. Therefore, the working hypothesis for the present study is that the SPG is potentially bistable in all coupled climate models, but high-frequency atmospheric variability (noise) disturbs the ocean enough as to inhibit the full development of one of the two stable circulation modes (Fig. 1). We investigate this bistability in pre-industrial control simulations of 19 different comprehensive coupled climate models and quantify to what extent it is distinguishable from red noise.

Fig. 1
figure 1

Schematic illustration of the SPG potential U(SPG), the testable hypothesis of this study. The potential well is bistable with two local minima. In the presence of climate variability, the energy thresholds E 1 and E 2 that separate the two stable equilibria are crossed frequently so that no stable modes can develop. However, the longer residence times in the local minima are theoretically detectable by analyzing the corresponding time series

The study is organized as follows. The data used and the method of analysis are presented in Sects. 2 and 3 discusses the results of the analysis, by giving a general overview of the simulated circulation, presenting the statistical analysis of two different indices for the SPG circulation and regional differences between models, and relating the bistability to a physical mechanism. Finally, conclusions are given in Sect. 4.

2 Data and method

We use data from 14 different preindustrial control experiments of the Coupled Model Intercomparison Project 3 (CMIP3) and five additional, more recent experiments (Table 1). The selection was limited to models that provide the depth-integrated stream function over a simulation period of at least 200 years. These data are not yet available in the CMIP5 database. In one case, HadCM3, the analyzed data corresponds to the same simulation as in the CMIP3 archive, but extended to 1,000 years. In this model, the stream function shows a strong checkerboard pattern that may indicate a numerical issue. In this study, we will use a stream function calculated from the three-dimensional field of meridional velocities. Although this does not guarantee the elimination of numerical modes if they are present in the barotropic solver of the model, the result is a reasonable representation of the flow field without the pattern mentioned above.

Table 1 Coupled models used in this study, length of simulation, filter width (2 σ of Gauss filter, Sect. 3.2), time-average SPG strength defined as spatial maximum of the circulation and average over 60–0°W and 45−64°N, horizontal resolution of ocean component and key references
Table 2 Composites as defined by the minima of the reconstructed potential well (SPG AVE Fig. 6)

We define two SPG circulation indices which are analyzed in all models. Usually, the SPG index is defined as the local minimum of the depth-integrated stream function in the subpolar North Atlantic. Due to the cyclonic circulation, the stream function is negative and the minimum represents the strongest cyclonic transport within the SPG (SPG MAX ). However, the minimum is not located in the same area in all models and potentially represents different small-scale recirculations inside the SPG that are not representative of the large-scale circulation. In addition, the minimum might move between different local minima in time. Therefore, the SPG minimum is not robust for intercomparison purposes and will not be used here. We use the average of the stream function between 60 and 0°W, and 45 and 64°N (SPG AVE ). A shortcoming of this index averaged over a relatively large region is that it overemphasizes the region with weak variability and relatively weak circulation. Therefore, a second index is defined as the stream function weighted by its decadal variance (SPG VAR ): The barotropic stream function is time-filtered with a running average over 11 years before calculating the variance. The result is averaged over the same region as before. Both indices are normalized.

In order to assess whether the SPG indices represent a potentially bistable system, we employ the following approach. The dynamics of the SPG is assumed as a stochastically driven motion in a one dimensional potential well U,

$$ \frac{\partial}{\partial t} SPG= - \frac{\partial}{\partial SPG} U (SPG) + \sigma \eta(t), $$
(1)

where η(t) denotes Gaussian noise with unit variance and σ is the standard deviation of the stochastic forcing. The stationary solution of the corresponding Fokker-Planck equation provides an estimate for U based on the empirical distribution density p d (Gardiner 1985) with

$$ U = - \frac{\sigma^2}{2} log\; p_d. $$
(2)

The reconstructed potential U is then analyzed for multiple equilibria. This method has been successfully tested on artificial time series data (Livina et al. 2012) and applied to Greenland ice core data (Ditlevsen 1999; Kwasniok and Lohmann 2009; Livina et al. 2010) and other paleo proxy data (Livina et al. 2011). The stochastic part of variability of the SPG might change with the average circulation strength which is not accounted for in Eq. (1). Therefore, state-dependent noise is implicitly considered to be part of the signal.

As an addition to the algorithm used in previous studies, we attempt to quantify the probability for the SPG index to be the result of a bistable potential. This is done by comparing the reconstructed potential to 500 equally processed realizations of red noise that match the autoregressive coefficients of the original time series. A first-order autoregressive process is used to model the red noise. In most cases the SPG potential is dominated by noise and fits well within the range defined by the random time series. This is not unexpected since the noise is chosen to resemble the original SPG time series. However, additional information can be deduced from its shape. Deviations of this shape that are inconsistent with red noise are detected by calculating the difference between the SPG potential and each of the 500 random potentials. A fourth order polynomial is fitted on the residual, weighted by the original probability density to obtain the best fit where most of the original data is available. The percentage of fitted polynomials with positive fourth- and negative second-order coefficients represents a measure for the residual potential to have a shape consistent with a double well. This tendency must also be present in the original reconstructed potential and suggests an underlying system with two modes. We therefore define the percentage of matching fits as the bistability detection ratio. Note that this method specifically tests for two potential wells, ignoring the possibility of additional circulation modes.

The robustness of this method is tested with an artificial time series of alternating pieces of white noise of 30 points duration. Every piece is offset by a signal that is scaled as a fraction of the standard deviation of the white noise so that the two maxima of its distribution density are increasingly separated. After that, the entire time series is normalized and filtered with a Gaussian filter of half-width 5, in the same way as the SPG indices. The above method is applied to estimate the potential (Fig. 2a). For the first 1,000 realizations, the signal width is zero in order to quantify the confidence interval. In 95 % of these cases, detection ratios for this null signal are below 54.3 %. This value will be used as minimum level confidence in the following. Above a signal of 0.6σ, the bistability detection yields ratios above the significance level in most cases. More than 95 % of the time series are correctly identified as bimodal for a signal larger than 1 σ. Most importantly, the method is robust as small signals are not accidentally misinterpreted. Distributions broader than the normal distribution (platykurtic) might erroneously be identified as bimodal in limited cases (Fig. 2b). Thus, the result of the detection method needs to be validated by the reconstructed potential and the original time series.

Fig. 2
figure 2

a Bistability detection ratio for 3,000 artificial time series of alternating pieces of white noise. The step change between individual pieces, the signal, is kept at 0 for the first 1,000 time series, followed by a gradual increase to two standard deviations of the noise. The average is shown as solid black line, defined as a moving average of 201 points width above 0. 5 and 95 % quantiles are shown as dashed lines, also calculated over a 201 point window above 0. b Bistability detection ratio for 2,000 artificial time series of increasing kurtosis k. The normal distribution has a kurtosis of 3. The 201-year running average is shown as solid black line, 5 and 95 % quantiles are shown as dashed lines. Inset: Sample distributions for normal (thick solid), platykurtic (solid thin, k=1.87) and leptokurtic (dashed, k=10.71) distributions

Since the signals are masked by strong noise, filtering is essential (Kwasniok and Lohmann 2009). The model time series are subject to a simple Gaussian filter of varying width because the optimal filter width is model dependent and a priori unknown. It depends, among other factors, on the time scale of the simulated variability. Therefore, the bistability detection score is calculated repeatedly for a range of filter widths, and the optimal filter width is selected afterwards as the one returning the highest bistability detection score. The filter width, half of the width of the Gaussian bell curve, is limited to above 5 years to eliminate interannual variability. The upper limit is chosen to be 50 years or less in case it would degenerate the degrees of freedom of the original time series to below 10. As an example, for a time series of 200 years length, the filter width must not be more than 10 years because the approximate width of the full Gaussian curve (2σ) is 20 years of which only 10 individual pieces fit into the signal time series.

3 Results

3.1 Simulated circulation and variability

The large scale circulation is reasonably well represented in most models (Fig. 3). In all but one exception, INMCM 3.0, both the SPG and the subtropical gyre (STG) are separated between 45 and 50°N. While the strength of the STG is comparable in all models, considerable variations exist in the shape of the SPG and its strength with extremes reaching from below 10 to more than 50 Sv. Models with lower resolution in its ocean component simulate a more eastern center of the circulation potentially impacting the air-sea coupling and the simulated variability. Observations report a circulation strength of between 26 and 33 Sv for the boundary current in the Irminger Basin (Clarke 1984; Reid 1994; Bacon 1997). However, this estimate might be too low due to difficulties to resolve the narrow boundary current with measurements. An ensemble of regional ocean models yields values between 25 and 60 Sv in the Labrador Sea (Treguier et al. 2005).

Fig. 3
figure 3

Depth integrated stream function (1Sv = 106 m3/s) of the North Atlantic in different comprehensive models

The simulated standard deviation of the depth-integrated stream function shows large differences between models, both in pattern and amplitude (Fig. 4). This is partly due to the large differences in time-average strength, because a strong absolute circulation also tends to show a larger amplitude of variations. Common features include a higher variability along the path of the North Atlantic Current and in the SPG. In addition, three models simulate large decadal variability in the STG, INMCM 3.0, ECHO-G and HadCM3. MIROC 3.2 medres shows a large variability everywhere which is an artifact of the calculation of the stream function (not shown). It was calculated by meridionally integrating the zonal barotropic velocity, instead of zonally integrating the meridional velocity as in other models. By this choice, variations of the stream function in the North Atlantic represent not only changes of the local circulation, the North Atlantic SPG and STG, but variations along the entire integration path from the coast of Antarctica. This leads to a time-dependent, zonally heterogeneous offset that is dominated by variability in the Antarctic Circumpolar Current and that can not be corrected for accurately. An approximate correction has been attempted for Figs. 3 and 4. The time-average stream function was normalized to zero at the American coast. The standard deviation has been normalized to zero at the equator. However, since we can not rule out that the SPG indices are contaminated with remote variability, we will not analyze MIROC 3.2 medres further.

Fig. 4
figure 4

Logarithm of standard deviation of the depth integrated stream function in different models. A Gaussian filter with 5 years half-width has been applied before calculating the variance to attenuate interannual variations. In many models, most variability is found in the path of the North Atlantic Current and in the SPG region. Some models simulate high variability in the North Atlantic subtropical gyre

The large variability in HadCM3 appears to be related to topography, especially in the western basin of the North Atlantic. While this potentially represents a numerical mode of variability, it is not unlikely that it is the result of circulation changes in the deep ocean that would also resemble topography.

FGOALS 1.0 shows a strong asymptotic decline in the SPG circulation in all three ensemble experiments (not shown) that also causes high values of variance in the stream function. Since the same weakening is seen also in other circulation indices, this simulation is considered unsuitable for the following analysis.

3.2 Bistability

Prior to the analysis for bistability, the two SPG indices, SPG AVE and SPG VAR , are filtered by a Gaussian filter (Fig. 5). The filter width is different for each model to adapt to the differences in the simulated time scale of variability. The optimal width is chosen after the calculation of the bistability detection ratio for SPG AVE to yield the maximum value (Sect. 2). The same filter width is then applied to SPG VAR as well. Generally, the optimal filter width could be different for SPG VAR , but would also emphasize a different time scale of variability and therefore potentially a different physical mechanism. However, most models show only small differences in optimal filter width between the two indices.

Fig. 5
figure 5

Time series for SPG AVE (upper), and SPG VAR (lower), smoothed with a Gaussian filter. The optimal filter width for each model is noted in the upper left of each panel (2σ, in years). Below, the amplitude of one standard deviation is given for each model in SPG AVE (in Sv). The linear trend of the unfiltered time series for SPG AVE is shown on the lower left (in mSv/yr)

Potentials are reconstructed from the two SPG indices by calculating the negative logarithm of the empirical distribution density (Eq. 2, Fig. 6). Most of the reconstructed potentials lie within the range defined by the 5 and 95 % quantiles of the red noise ensemble, which is not unexpected. This does not imply, however, that the SPG potentials are indistinguishable from red noise, because the individual shape of the potentials contains additional information. As an example, the potential of SPG AVE in BCM2.0 is wider than expected for a random process. With 88 % probability, this potential represents a bimodal distribution masked by strong noise.

Fig. 6
figure 6

Reconstructed potentials (black) for SPG AVE (upper) and SPG VAR (lower), for 17 different climate models. 5/50/95 quantiles for red noise are shown in red. The bistability detection ratio is inset for each model. Many reconstructed potentials lie mostly within the range of the red noise ensemble. However, this does not preclude that its individual shape is distinguishable from the random noise, which is expressed in the bistability detection ratio. Ratios significantly above the confidence interval are highlighted with bold axes

Detection ratios are different for SPG AVE and SPG VAR . Seven models yield ratios above the confidence interval of 54.3 % for both indices, eleven for SPG AVE and nine for SPG VAR . High detection ratios are plausible from the comparison of the corresponding time series (Fig. 5) since many of these models show rapid transitions. The time scale, with which these transitions are repeated, however, does not appear to be the same in all models. Six of the seven models with significant detection ratios will be analyzed further below. We discard results of INMCM 3.0 primarily for having the largest linear drift in the SPG AVE index of all models (Fig. 5) and its unrealistic stream function (Fig. 3). Although CCSM4 2deg has a comparably high linear tendency in SPG AVE , this is readily explained by the large but temporary circulation anomaly at the beginning of the time series.

The most evident double well potential is found for CCSM4 at 2° × 2° resolution (CCSM4 2deg), which is due to a single event of approximately one century duration. This spontaneous onset and demise of a stronger SPG circulation will be discussed in more detail below. Note that the same ocean model coupled to the same atmosphere model with 1° × 1° resolution does not produce this signal. This highlights one shortcoming of the analysis, namely that the preindustrial control simulations used here cover only a small part of the physically possible phase space.

The bistability detection ratio depends on the definition of the SPG index, as shown above. Variations in the SPG circulation as captured by the SPG AVE index do not necessarily extend to the entire subpolar region. They might represent variability in a sub-region, while the rest remains unchanged. Regionally refined circulation indices could be used to investigate this further but are hampered by dissimilarities between models, e.g. the geographical location of local recirculation gyres or deep convection regions. Therefore, we employ the bistability detection method on the depth-integrated stream function at the model resolution level, at each individual grid point, and identify regions of bistability afterwards, for the six models with highest detection ratios in the SPG indices (Fig. 7).

Fig. 7
figure 7

Bistability detection ratio for every grid point of six different climate models. The filter width is the same as for the SPG indices (Fig. 5). Values below the level of confidence (54.3 %) are not shown. Contours show the time-average depth-integrated stream function (5 Sv spacing)

Different models do not agree in the regions of high detection ratios, but certain similarities can be identified. Higher detection ratios are mostly found in the western part of the SPG, specifically the Labrador Current and the intergyre region. The recirculation around the center of the SPG, the circulation maximum, shows significant ratios in IPSL CM4, representing the non-continuous advection pathway described by Born and Mignot (2011), in the high resolution version of the same model, IPSL CM4 hires. These recirculation centers in Labrador Sea and Irminger Basin also yield highest ratios in CNRM CM3. MPI-ESM-LR shows only small regions of significant detection ratio, mostly concerning the flow across the Mid-Atlantic ridge and the intergyre region, as well as along the boundary currents in Irminger Basin and Labrador Sea. Detection ratios are above 95 % throughout the subpolar North Atlantic in CCSM4 2deg, representing an intensification of the entire SPG.

3.3 Physical mechanism causing bistability

From the statistical analysis, a bistability is most evident in CCSM4 2deg with two clearly different modes of circulation in the estimated potential (Fig. 6) that affect a large region (Fig. 7). The physical mechanism causing this behavior is investigated in this section, comparing composite fields corresponding to the two potential minima in the SPG AVE index (Fig. 6). For consistency with the previous analysis, both the index and the fields are filtered with a Gaussian filter of model-adjusted width (Table 1).

The stronger circulation of the SPG is associated with higher surface salinities in the subpolar North Atlantic (Fig. 8), establishing a third region of deep convection instead of only two with weak SPG circulation (Fig. 9). Stronger deep convection in the center of the SPG causes more heat being lost to the atmosphere and the entire water column to become denser down to the maximum depth of convection. The cyclonic flow around this core of anomalously dense water strengthens, transporting even more salt into the western subpolar North Atlantic. These are the key elements of a positive advective-convective feedback loop that has been described for the first time by Levermann and Born (2007) and was later revised by Born and Mignot (2011) (Fig. 10). This previous work concerned SPG variability on decadal time scale and identified delays of approximately 10 years associated with salt accumulation in the SPG center and the cooling of the water column. Note that the discussion of time lag is omitted in the present study primarily because the model data is time filtered with a decadal time scale which renders the expected time lag negligible. All of the six models with significant bistability detection ratios but CNRM CM3 show variations on centennial time scale (Fig. 5). Note also, that Born and Mignot (2011) address variability of another SPG index, representative of the western recirculation within the SPG, leading to different anomaly patterns.

Fig. 8
figure 8

Differences of strong–weak SPG AVE composites (± 1 σ) in CCSM4 2deg. a Anomaly of salinity (colors) and anomalous depth-integrated stream function (contours, spacing 2 Sv, negative dashed). b As before, but for average density of the upper 500 m

Fig. 9
figure 9

Composite mixed layer depth (shading) and depth-integrated stream function (contours, spacing 5 Sv) in CCSM4 2deg. a for weak SPG AVE . b for strong SPG AVE

Fig. 10
figure 10

Positive feedback mechanism causing the nonlinear behavior of the SPG. A stronger SPG advects more salt into the subpolar North Atlantic, enhances deep convection and intensifies deep convection. The subsequent cooling and densification of the SPG center strengthens the baroclinic circulation around the dense core, closing the feedback loop

The composite analysis is repeated for all six models with high detection ratios to investigate if the mechanism identified in CCSM4 2deg can be generalized. The composites are defined by the two local minima of the reconstructed potential well for each model (Fig. 6, Table 2). Only differences of composites for strong - weak SPG AVE are shown (Fig. 11).

Fig. 11
figure 11

Differences of strong–weak SPG AVE composites for the six models with significant detection ratios for both indices (Fig. 6, Table 2). Models generally agree that a stronger SPG is associated with more vigorous flow in the STG as well. The stronger SPG goes along with a salinification in the subpolar North Atlantic and consequently stronger deep convection in all models but CNRM CM3. Stronger convection causes higher densities of the upper 500 m that is co-located with the largest increase in cyclonic circulation in the SPG. Thus, the four models CCSM4 2deg, IPSL CM4 and HadCM3 suggest that a common mechanism causes SPG variability, while variability in CNRM CM3 is the result of a different process

The shape of the anomalous SPG circulation varies between models. Five of the six models show a pattern in which both gyres in the North Atlantic, SPG and STG, are enhanced. This is not the case in HadCM3, where the negative anomaly in the stream function extends far to the south, indicating a weakening of the STG. A similar signal is observed in CNRM CM3. This suggests that the minima of the reconstructed potential correspond to different dynamics.

Composite patterns in IPSL CM4 ,MPI-ESM-LR and CGCM3.1 T47 qualitatively match those of CCSM4 2deg. In IPSL CM4, the anomalies do not extend into the Labrador Sea which is explained by a freshwater bias in this model version (Swingedouw et al. 2007).

CGCM3.1 T47 shows strong increases in sea surface salinity and 0-500 m average density throughout the Atlantic basin and along the eastern Nordic Seas. This is the result of a general intensification of the AMOC that coincides with a stronger SPG. A positive correlation between SPG and AMOC, associated with the increase in deep convection, is common to many coupled climate models (Delworth et al. 1993; Eden and Willebrand 2001; Spall 2008; Yoshimori et al. 2010; Born and Mignot 2011). Although the respective contributions of SPG and AMOC to the simulated anomaly patterns is difficult and will not be investigated in this study, we note that both fields are consistent with the proposed feedback and show higher anomalies in the western subpolar North Atlantic.

Anomalies in HadCM3 and CNRM CM3 are less conclusive and spatially inconsistent. A causal connection with the advective-convective feedback is doubtful in these models. This does not imply that this feedback mechanism does not exist in CNRM CM3, which is unlikely because it employs the same ocean component as IPSL CM4. The coupling to a different atmosphere model, implying a different average climate, apparently favors another mode of variability in the region that also possess two preferred modes of circulation and are therefore identified by the bistability detection algorithm. Close inspection reveals several differences in CNRM CM3: The time-average stream function has a different shape and other centers of recirculation (Fig. 3), the regions of variability are shifted (Fig. 4) and the amplitude of variability is half of that in IPSL CM4 (Fig. 5), and the anomalous stream function does not resemble an intensification of the SPG, but rather an anomalous circulation in the western basin that spans both the SPG and the STG (Fig. 11). In addition, the available data for CNRM CM3 is limited to 200 years, the shortest simulation time of all models in this study (Table 1).

In HadCM3, the bistability detection algorithm appears to capture a qualitatively different kind of variability as well. Sea surface salinity and mixed layer depth show only negligible anomalies. A sizable signal in upper ocean density is seen in the Nordic Seas, indicating that the observed changes in the depth-integrated stream function might be related to Arctic and Subarctic circulation changes and the flow across the Greenland Scotland ridge. We speculate that this leads to changes in the route and vigor of the Deep Western Boundary Current which would explain the observed anomaly in the depth-integrated stream function. Note, however, that HadCM3 stands out as the model with strongest variability (Fig. 4 and that we can not completely rule out numerical issues (see Sect. 2).

4 Discussion and conclusions

We analyzed the SPG circulation in 19 coupled general circulation climate models. Of the six models that significantly show two SPG circulation modes, four models agree on the advective-convective positive feedback mechanism. Note that while this number appears to be relatively small, it mainly reflects fundamental difficulties in detecting distinct circulation states in noisy time series. Where these states can be determined, the proposed feedback mechanism is found to be consistent in four out of the six cases. We conclude that these findings are consistent with the hypothesis that the advective-convective feedback is present in comprehensive coupled climate models. It also implies that the SPG is bistable in this class of models and that rapid changes in its circulation are possible with a change in forcing.

The amplitude of variability in the SPG is above the average in the models with two circulation modes, supporting the finding that the advective-convective feedback is required to realistically simulate the observed climate variability in the region (Mengel et al. 2012). The strongest signal is found in CCSM4 2deg, showing a spontaneous and rapid strengthening of the SPG that persists for approximately 100 years before it weakens again. CCSM4 is the third climate model for which this has been demonstrated directly, after CLIM BER—3 α (Levermann and Born 2007) and IPSL CM4 hires (Born and Mignot, 2011).

The importance of salt advection to the SPG center remains controversial since several studies highlight the importance of eddy exchanges between the convective center of the gyre and the boundary current (Lilly et al. 1999; Prater 2002; Spall 2004; Straneo 2006a; Deshayes et al. 2009). While this observations-based view is certainly correct, it does not conflict with the notion that the advective salt transport in the Irminger Current has an important impact on the dynamics, because the two processes act on different spatial scales. As shown here, a stronger SPG redistributes salt between east and west on a basin-wide scale, independent from local changes in eddy salt flux occurring between the Irminger Current and the SPG center. Even without changes in the efficiency of eddy salt flux between boundary current and center, the higher salinity in the Irminger Current reaches the SPG center. The model showing the most prominent bistability, CCSM4, employs a comprehensive set of mesoscale and sub-mesoscale parametrizations as well as a scheme to improve the representation of sub-grid overflow processes that control the exchange across the Greenland Scotland ridge (Danabasoglu et al. 2010, 2012).

IPSL CM4 hires received low detection scores for a potential bistability in this study, which does not contradict the findings of Born and Mignot (2011). The earlier study’s conclusion that the advective-convective feedback plays a major role in the amplification of decadal variability does not imply that the resulting time series is detectably bimodal. Similarly, it does not follow from the low detection ratios found here that the feedback does not impact the circulation and variability. Indeed, the importance of the feedback has been confirmed for this model system in its low-resolution version. Note that the only difference between the two versions is the resolution of the atmosphere model. Therefore, the same dynamics must operate in both ocean models, but only one version is detected as potentially bistable. The same applies to CNRM CM3, that also uses the same ocean component and does show a bistability of the circulation in the region, but with a weaker amplitude, a different pattern and apparently related to a different dynamical mechanism. An additional example are the two versions of CCSM4 that share the same ocean model coupled to the same atmosphere model with different resolution, as well as the two versions of CGCM3.1. One possible explanation is that during the short simulation time and without external forcing, the preindustrial control simulations used here do not reach every physically possible mode of circulation and a potential bistability remains unnoticed. In addition, a different resolution in the atmosphere likely changes the average climate in certain regions as well as the simulated variability. This has a potentially large effect on the advective-convective feedback through deep convection.

The present study does not address the completeness of the proposed feedback mechanism, that is raised by the above reasoning that atmospheric processes might play a role. The feedback depends on large-scale processes that are common to all models, but it appears to have only minor impact in most. It is important to stress that due to the aforementioned limitations, the method employed here is not sufficient to rule out a significant contribution to the dynamics by the advective-convective feedback. However, it is plausible that differences in sub-grid parametrizations, the representation of boundary currents, deep-ocean overflows and sea ice, the parametrization of deep convection processes, and the simulated variability of the North Atlantic atmospheric circulation all impact the effectiveness of the feedback and thus cause differences between models. Some of these processes have been shown to affect the SPG in some way (Straneo 2006b; Iovino et al. 2008; Born and Levermann 2010; Häkkinen et al. 2011). However, the analysis of these details goes beyond the scope of this study and must be addressed by specific sensitivity tests in future work.

Evidence for bistable dynamics of the SPG from a modeling perspective converges with a theory based on the analysis and simulation of paleo climate archives. During the earliest part of the present interglacial until approximately 8,200 years before present, the SPG was in a predominantly weak circulation mode (Born and Levermann 2010). After that, it changed into a predominantly strong circulation mode, triggered by a large meltwater event during the final stages of the deglaciation. However, this transition was likely not a singular event, but a modulation of the relatively frequent transitions between the two circulation modes presented in this study and in Born and Mignot (2011). In this view a predominantly strong circulation implies a shift toward a longer retention time in the strong mode at the expense of the weak mode and vice versa. Several smaller-amplitude shifts have been documented in paleo proxy data throughout the last 8,000 years (Thornalley et al. 2009). The present study suggests that the same physical mechanism describes these long-term paleo events and higher-frequency (multi-)decadal climate variability, providing the basis for future investigations of the topic.