1 Introduction

The Labrador Sea (LS) is one of the few sites where in wintertime ocean deep convection occurs. It forms a dense water mass, the Labrador Sea Water (LSW), which spreads across the northwest Atlantic at mid-depths (Talley and McCartney 1982). In this basin the oceanic winter heat loss due to intense atmosphere–ocean interactions results in convective mixing both in its interior (Marshall and Schott 1999) and over its shelves (Pickart et al. 2002), and the convective plumes act as windows through which atmospheric gases enter the deep waters.

The convective activity in the Labrador basin undergoes significant interannual-to-decadal variability (Lazier et al. 2002; Lazier 1980; Yashayaev 2007). Variations in LS deep convection eventually propagate throughout the North Atlantic (Talley and McCartney 1982; Curry et al. 1998) and impact the Atlantic Meridional Overturning Circulation (AMOC) by modifying the poleward heat transport (Eden and Willebrand 2001). For example, the formation of a fresh Labrador Sea Water layer during the 1990s was a major contributor to the marked freshening of the deep North Atlantic subpolar gyre in the same decade (Curry and Mauritzen 2005), and the decadal variability in LSW thickness correlates well with transport by the North Brazilian Current (Zhang et al. 2011).

More recently, it was highlighted that at least in the Community Earth System Model (CESM) buoyancy forcing over the LS is key in controlling AMOC decadal variability (Yeager and Danabasoglu 2014; Yeager et al. 2021), and that the variability in LS convection is in turn linked to the cumulative North Atlantic Oscillation (Yashayaev and Loder 2017; Rhein et al. 2017). OSNAP (Overturning in the Subpolar North Atlantic Program) observations, however, do not support a significant correlation between AMOC and LSW formation (Lozier et al. 2019), at least at interannual scales. Biases in ocean models and their tendency to overestimate the volume of water with LSW characteristics have been indicated as possible causes of the discrepancy. Li et al. (2019), for example, found that among the ocean-only models that participated in the Coordinated Ocean-Ice Reference Experiment (CORE) and originated from the three USA national laboratories, GFDL, NASA-GISS and NCAR, the overestimation was at least 60%. (Li et al. 2019). The models, however, did not locate the formation of LSW at the observed sites. Using a regional ocean model, Tagklis et al. (2020a) attributed this overestimation tendency by and large to model horizontal resolution. They found that resolving mesoscale and submesoscale transport from the west coast of Greenland to the Labrador Sea interior impacts the representation of LSW formation, with a 50% reduction in the modeled volume of convected waters when mesoscale advection is included (5 km horizontal resolution), and over 80% reduction if submesoscale processes are partially accounted for (1 km horizontal resolution), compared to a case that does not resolve the Rossby deformation radius of the basin which is of about 13 km. This overestimation may not be of the same magnitude across models, depends on the strength of convection, and is only one of the significant biases found in this area.

The convective patterns and variability are also very different, especially in coupled models where biases in both ocean circulation and atmospheric forcing hamper the representation of the dynamics in a highly variable region (Tagklis et al. 2017). An important source of bias, for example, is the input of tropical waters to the subpolar gyre of the North Atlantic via the North Atlantic Current which is then transferred to the Labrador Sea (Treguier et al. 2005; Talandier et al. 2014; Marzocchi et al. 2015; Tagklis et al. 2020b). While coupled models introduce new challenges, they may also provide the important stabilizing contribution of the atmosphere–ocean coupling, as recently shown through a suite of sensitivity experiments with the ECHAM6 model (Martin & Biastoch 2023). These biases and inter-model differences continue to limit the reliability of the latest simulations included in the Coupled Model Intercomparison Project Phase 6 (CMIP6), even if these models the overturning in the Labrador Sea is generally smaller than the mean overturning to the east (Jackson & Petit 2023), as in the observations. At the same time, the differences in AMOC representation across models shape the model projections of climate change into the future (Bellomo et al. 2021).

Here, we explore these differences separating oceanic and atmospheric contributions. We recognize that the resolution of the ocean model component remains a major limitation, and explore if raising it to a value achievable by the next CMIP iteration can help reducing significantly the current bias. We focus on 7 Earth System Models (ESMs), 5 from the CMIP Phase 5 (CMIP5) catalog and 2 from CMIP6, covering a range of model behaviors, and use separately their oceanic or atmospheric fields to force the regional ocean model adopted in Tagklis et al. (2020a) configured over the subpolar Atlantic gyre at 15 km horizontal resolution. Our objective is to establish linkages between the (modeled) LS circulation, the atmospheric forcing fields, the oceanic boundary conditions and their variability, and model representation. This is done with the understanding that future CMIP simulations will have greater resolution, but not enough to resolve, within the next decade, the ocean Rossby deformation radius, ocean bathymetry, topographic features and weather systems at high latitudes.

2 Background

2.1 The North Atlantic and Labrador Sea circulation

The surface circulation in the North Atlantic subpolar gyre and LS is cyclonic and intensified along the boundaries. In the LS, two relatively fast, narrow currents, the West Greenland Current (WGC) and the Labrador Current flow along the continental slopes and shelves, northward and southward, respectively. The WGC transports fresh and cold water from the Nordic Seas into the LS basin following the Greenland coast, while the Labrador Current carries cold and fresh water from Baffin Bay towards Nova Scotia along the coast of Labrador. Underneath and offshore of the WGC, the Irminger Current (IC) carries the warmer and saltier Irminger Sea Water (ISW) (Fig. 1). The IC plays a significant role in the restratification processes in the subpolar gyre and in the LS (Cuny et al. 2002) where is transported offshore by mesoscale eddies (Lilly et al. 2003). The IC maintains the LS warm enough to prevent ice formation in its central portion, in contrast with the shelf along the Labrador coast that is ice covered for several months of the year.

Fig. 1
figure 1

Etopo-2 topography and schematic representation of the Subpolar North Atlantic and Labrador Sea Circulation. Main currents include the West Greenland Current (WGC), East Greenland Current (EGC), Labrador Current (LC), Irminger Current (IC), and North Atlantic Current (NAC). The transect across the Labrador Sea used in several figures of this paper is also shown

Within the portion of the North Atlantic shown in Fig. 1, ocean convection occurs to the south of the Greenland tip where is driven by buoyancy fluxes and wind forcing, and especially in the central LS, where the formation of LSW is controlled by local surface buoyancy loss, and is modulated by variability in the atmospheric heat fluxes (Luo et al. 2014) and by changes in local stratification. Mesoscale coherent eddies modify the heat and salt budgets of the gyre interior, where convection occurs, transporting waters from the boundaries into the convective area (Lilly et al. 2003; Straneo 2006). Other factors contributing to the subpolar North Atlantic hydrography and its variability include freshwater inputs from melting of the Arctic and Greenland Ice Sheet (GrIs), continental runoff and precipitation. The GrIS mass losses accelerated in the past two decades, resulting in increased freshwater inputs into the adjacent seas, and future projections suggest a further exponential acceleration (Hanna et al. 2008; Fettweis et al. 2013). The analysis of global climate model experiments indicated that these freshwater anomalies may have the potential to weaken the AMOC and the LS convection (Rahmstorf et al. 2015; Boning et al. 2016; Caesar et al. 2018). These models, however, unrealistically place too much fresh water in center of the LS and in the subpolar gyre due to their limited resolution. Regionally focused ocean-only experiments at kilometer-scale horizontal resolution have shown that the GrIS-induced salinity anomalies have not had a significant impact on the LSW formation to date (Luo et al. 2016; Schulze Chretien and Frajka-Williams 2018; Tagklis et al. 2020a).

2.2 The ESMs

We focus on 5 ESMs used for the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5), previously analyzed in Tagklis et al. (2017), augmented by 2 ESMs from the CMIP6 catalog, as summarized in Table 1. The CMIP5 models include two versions of the Geophysical Fluid Dynamics Laboratory (GFDL) Earth System Model, GFDL-ESM2G and GFDL-ESM2M (Dunne et al. 2012; Dunne et al. 2013) which have the same atmospheric component and differ in their ocean module and vertical coordinate system (ESM2M uses depth-based coordinates and the Modular Ocean Model (MOM) and ESM2G uses isopycnal coordinates and the Generalized Ocean Layered Model (GOLD) (Adcroft and Hallberg 2006); the Community Earth System Model, CESM1-BGC (Long et al. 2013; Moore et al. 2013), indicated as CESM1 in the rest of the paper; the Institute Pierre Simon Laplace model in its IPSL-CM5A-LR version (Dufresne et al. 2013) and the Max Plank Institute MPI-ESM-LR simulation (Giorgetta et al. 2013). The CMIP6 ESM are CESM2 (Danabasoglu et al. 2020) and CanESM5 (Swart et al. 2019), chosen because they differ substantially in their representation of AMOC, which is slightly stronger than observed in CESM2 and half of the observed mean value in CanESM5 (Zhang et al. 2023 submitted). We limit our attention to 21 years in the historical period, from January 1985 to December 2005.

Table 1 CMIP5 models used in this study

2.3 The observational data

As observational counterpart we use the monthly objective analyses of the temperature and salinity (0–700 m of depth) from the EN4 dataset (Good et al. 2013). The uncertainty in the objective analysis is quite large in coastal areas and below 1000 m depth, and the convective area is not well captured. For quantities extending to depths deeper than 1000 m, such as the winter mixed-layer depth used here as proxy for the strength of convective events, we consider two regional simulations performed using the Regional Ocean Modeling System (ROMS) (Shchepetkin and McWilliams 2005) in its Coastal and Regional Ocean COmmunity model (CROCO) version (Debreu et al. 2008, 2012) that differ in their horizontal resolution as described below.

2.4 CROCO simulations

The CROCO simulations are performed with the model set-up described in Tagklis et al. (2020a). These runs are also used as benchmarks for our sensitivity integrations and provide a quasi-realistic representation of deep convective location and variability from subseasonal to interannual scales (Tagklis et al. 2020a). We adopt the non-local K-Profile Parametrization (KPP) scheme (Large et al. 1994) to parameterize unresolved vertical mixing and vertical turbulent fluxes, and upstream bias advection of velocity and tracer, without explicit horizontal smoothing. The regional ocean model domain covers the subpolar gyre (SPG) and extends from 45° N to 66.5° N and from 20° W to 68° W. The horizontal resolution is 15 km (SPG15) or 5 km (SPG5), with 40 vertical levels in both cases. Velocity, temperature and salinity fields are nudged at the open boundaries to the Simple Ocean Data Assimilation (SODA) reanalysis version 3.4.2 (Carton and Giese 2008). The topography is derived from ETOPO2 (Smith and Sandwell 1997) and is smoothed using the logarithmic interpolation method introduced by Penven et al., (2008) with a maximum slope parameter rmax = 0.4 where rmax is defined as rmax = Δh/hmean. Δh is the maximum difference between adjacent grid cell depths and hmean the mean depth at that point. CROCO is forced by daily momentum stresses and heat fluxes from ERA-Interim (Dee et al. 2011). The CROCO outputs are saved every 3 days for the SPG15 case and 10 days for SPG5. More details and validation plots can be found in Tagklis et al. (2020a).

2.5 The hybrid ESM-CROCO simulations

To quantify the relative role of the atmospheric and oceanic biases on the representation of convective processes within the study area, we performed two sets of numerical experiments, referred to as AT and OC, respectively. In all cases, CROCO was configured as in the SPG15 case, with a horizontal resolution of 15 km and 40 sigma layers in the vertical.

In the AT experiments we run the CROCO model forced by monthly fields provided by the ESMs, including wind stress, heat fluxes and evaporation minus precipitation, applied as net fluxes (without using any bulk formula), along with monthly oceanic boundary and initial conditions from SODA, therefore correcting the ocean stratification representation. The OC runs, on the other hand, were forced by ERA-Interim surface fluxes, with the ESM outputs used as ocean boundary and initial conditions. Given the low sensitivity found to the freshwater fluxes from Greenland by Tagklis et al. (2020a) also at 15 km resolution, we did not include any meltwater runoff.

Both AT and OC runs were initiated in 1980, with a spin-up period of five years which guarantees convergence of the eddy kinetic energy in the regional set-up adopted. Outputs were saved every 6 days starting from January 1985 until the end of 2005. We chose not to apply additional corrections to the heat or freshwater fluxes from different sources of oceanic and atmospheric data to highlight the differences arising from the various forcing fields.

3 Results

3.1 Climatology and interannual variability in the CROCO runs and in the ESM

We first introduce a basic validation of the yearly climatology of temperature and salinity averaged over the upper 700 m of the water column (Figs. 2 and 3), and of the mixed-layer depth (MLD) with its variability during the convective-season (January to May) (Fig. 4) in the CROCO runs. In this work the MLD is calculated as the depth at which the density difference from the surface reaches 0.008 kg/m3 (Tagklis et al. 2020a). The convective patch is then defined by the area where the MLD is deeper than 1000 m during the convective season, and the convective volume as the volume of water contained within the convective patch from the surface to the base of the mixed-layer. For a more in-depth validation of the SPG5 and SPG15 simulations on a later period the reader is referred to Tagklis et al. (2020a) and for an earlier, comparable version, at intermediate resolution over the same time to Luo et al. (2014).

Fig. 2
figure 2

Labrador Sea climatological mean state between 1985 and 2005 in temperature (top) and salinity (bottom is PSU) in the upper 700 m of the water column. The plots are derived from the EN4 analysis (OBS) and CROCO runs (SPG15 and SPG5)

Fig. 3
figure 3

a Time series of temperature and salinity anomalies (PSU unit) averaged over the central Labrador Sea (50°–60° W, 55°–62.5° N) and the upper 700 m of the water column in the two CROCO runs (SPG15 and SPG5) and in the EN4 analysis (OBS). In the bottom two plots a twelve-months running mean was applied. Mean values are indicated in legends. b Evolution of potential temperature (T, in °C) in the central Labrador Sea in SPG15 (top), SPG5 (middle) and in the EN4 dataset (bottom) during the ARGO period from July 2002 to December 2005. The model output was saved every 3-days averages in the SPG15 case and as 10-days averages for SPG5, therefore the different resolution. The EN4 profiles are averaged over 5 days to smooth the noise of the Argo profiles which are nonuniform in space and time

Fig. 4
figure 4

Mean mixed layer depth (MLD) in the 15 km (a) and 5 km (b) CROCO SPG runs calculated averaging over the 1985–2005 period during the convective season (January to May), with the black contour surrounding the area where the MLD is deeper than 1000 m. c Time evolution of the convective volume for the 21 years considered (c). d and e are Hovmöller diagrams of MLD at 58° N

The model is warmer and saltier than the EN4 analysis along the Irminger Current path and around the Newfoundland and Labrador Peninsula. The absence of the Irminger Current signature in the low-resolution EN4 dataset is a known bias due to its spatial smoothing, and in-situ transects display a good agreement with location and anomaly amplitude of the current in CROCO (e.g. Hall et al. 2013). The bias around Newfoundland and Labrador, on the other hand, is due to a rudimental representation and underestimation of sea-ice formation and melting in CROCO. The SPG runs are also slightly warmer and saltier than EN4 in the interior of the basin. It should be noted, however, that only few and sparse observations contribute to EN4 before 2002, when ARGO became available, especially during the late fall to early spring seasons.

The temperature and salinity evolution in the central LS (50°-60° W, 55°-62.5° N) in the CROCO runs and in the EN4 objective analysis are shown in Fig. 3. In (a) temperature and salinity time series are depth averaged between the surface and 700 m for the whole period considered and in (b) the area average is plotted across the water column from July 2002, when the ARGO data became available, to the end of the simulation.

Figure 4 shows the mean and time evolution of the mixed-layer depth in the convective season in the twenty years considered in the CROCO runs, highlighting how the area where convection occurs depends on resolution, as analyzed in Tagklis et al. (2020a). This area has a comparable shape in the two runs, but is larger in the 15 km case. The convective area would be further reduced by another ~ 25% if the horizontal resolution was finer than 5 km. Tagklis et al. (2020a) found that the resolution dependency scales linearly with the mean vorticity over the model domain, which increases when mesoscale and submesoscale dynamics are resolved. Better resolved eddies imply greater transport of Irminger water into the interior, in turn limiting depth and extension of convective activity. The variability, on the other hand, compares well with the observed one and is similar in the two CROCO realizations, being mostly controlled by changes in heat fluxes (Luo et al. 2014). This can be seen comparing the time-series of convective volume and the Hovmöller diagrams of MLD. Noticeably, resolution differences are smaller in the first period of strong convection (1986–1995) than in the following decade. This strong-to-weak convection change cannot be explained only in terms of strong versus weak winter heat fluxes, and results in part from the warming trend in the Irminger Current (Luo et al. 2014). The Irminger warming affects the representation of winter convection only if the resolution is sufficient to capture the contribution of the Irminger Rings. In SPG15, the Irminger Rings barely form because the Rossby deformation radius of the LS is not well resolved, while they form in SPG5 (Tagklis et al. 2020a). Nonetheless, SPG15 produces a realistic distribution of temperature and salinity and a realistic variability of convective activity (Fig. 3). In the remaining of this work, we will adapt the SPG15 configuration and re-run CROCO with forcing fields from CMIP5/CMIP6 models, and we will compare the outcome to SPG15.

The climatological analysis is repeated for the 7 ESMs. Figure 5 summarizes the ESMs climatology, Fig. 6 their biases and Fig. 7 the temperature and salinity profiles, and density biases of the ESMs with respect to SPG15 over the central LS. In Fig. 7 should be noticed that the profiles indicated as OBS and calculated using the EN4 analysis are based on very sparse observations that sample the area less than 3 times/year at any time before July 2002 and never in fall or winter (see Fig. 4e in Luo et al. 2012).

Fig. 5
figure 5

From top to bottom: Upper 700 m yearly climatology (1985–2005) for temperature (top), salinity (middle) in the upper 700 m of the water column, and convective season MLD (bottom) in the seven ESMs considered

Fig. 6
figure 6

Model biases in the upper 700 m yearly climatological temperature (top), salinity (middle in PSU) and convective season MLD (bottom) of the seven ESMs with respect to the SPG15 model

Fig. 7
figure 7

Profiles of year-round climatological (1985–2005) (a) temperature (°C), (b) salinity (PSU) and (c) density difference (kg/m3) with respect to SPG15 in the CROCO runs and the ESMs over the central Labrador Sea (box 50°–60° W, 55°–62.5° N). In (a) and (b) the EN4 analysis is used in lieu of observational proxy (OBS). Observations however are very sparse and mostly collected in spring and summer before July 2002. The o in panels (a) and (b) indicate the depth of model layers

The ESMs considered are warmer (CESM1 and CESM2, MPI-ESM-LR) or colder (GFDL-ESM2G, IPSL-CM5A-LR, CanESM5 along the boundaries) than SPG15, saltier (CESM1 and CESM2) or fresher (GFDL-ESM2G, IPSL-CM5A-LR and CanESM5), but independently of their climatology they underestimate the MLD in the central Labrador Sea and therefore poorly represent convection activity (Tagklis et al. 2020b). CESM1, CESM2 and CanESM5 overestimate the MLD along the coast of Greenland, where these models extend winter convection. The upper water column in the ESM is generally less dense than in CROCO, with the greater stratification limiting convective activity, CESM1 and CESM2 being the closest to CROCO (Fig. 7). In CESM, though, the ice coverage in winter is too extensive over the LS, and convection in the central portion of the basin does not occur, but is compensated by excessive deep‐water formation to the east of Greenland and in the Irminger Sea (Danabasoglu et al. 2020). The water in the upper ~ 300 m of the water column in the central Labrador Sea between 50°–60° W and 55°–62.5° N is too cold in all models but the two CESM versions and MPI-ESM-LR, and too fresh in all but CESM. On the other hand, all models are too warm and generally too salty in waters deeper than 500–600 m. The Hovmöller diagrams of MLD at 58° N do not show convective activity for most of the models considered (Suppl. Mat. Fig. 1).

3.2 Hybrid simulations

3.2.1 The AT simulations and the role of the ocean stratification

The analysis above is repeated for the CROCO-Hybrid runs at 15 km horizontal resolution. In the AT case, the atmospheric forcing is from the ESMs but the oceanic stratification and velocity fields at the ocean open boundaries are from SODA. For these runs it is important to keep in mind that they are uncoupled, and the convective response will differ from those found in the ESMs not only because of the different ocean conditions but also for absence of air-sea coupling. Four of the models (CESM1 AT, CESM2 AT, GFDL-ESM2M AT and MPI-ESM-LR AT) have deep convective mixing in winter in the Labrador Sea and to the east of Greenland, with different relative strengths (Fig. 8 and Suppl. Mat. Fig. 2). GFDL-ESM2G AT displays convective activity only on the eastern side of Greenland, and the two remaining models, IPSL-CM5A-LR AT and CanEMS5 AT, which in their coupled configuration share the NEMO ocean module, continue to have limited MLD deepening in the Labrador Sea. In the IPSL-CM5A-LR AT case, changing the ocean stratification makes only a small difference in the MLD representation. The temperature and salinity biases in the upper 700 m decrease significantly, as to be expected, but a warming bias remains in GFDL-ESM2G AT, IPSL-CM5A-LR AT and CanEMS5 AT due to the heat flux forcing, and these AT runs are too warm year around and therefore too stratified in the Labrador Sea (Suppl. Mat. Fig. 3).

Fig. 8
figure 8

As in Fig. 6 but for the AT integrations

The first noticeable feature of the AT runs is the much-improved and very similar representation of the salinity field integrated over the upper 700 m. The ocean boundary and initial conditions control this field, given that the AT integrations, which account for differences in the surface evaporation minus precipitation fluxes across the ESMs (Suppl. Figure 4), have all similar salinity biases. At the surface, differences between the AT runs and the ESM realizations that underestimate surface salinity (all ESMs but CESM, Fig. 7) are to ascribe to the ocean circulation and to their representation of GrIS-melting. GrIS-melting in the EMSs is often too strong (see surface temperature bias in Fig. 7 and discussion of heat fluxes later in this section) and the resulting freshwater extends too far offshore, instead of being confined along the coastline of Greenland (Tagklis et al. 2020a). In the CESM versions and especially CESM2, on the other hand, the ice coverage in the Arctic region and subpolar gyre is too extensive, and the upper water column too salty.

The temperature representation is also improved, but biases remain in the representation of the MLD. The simulations CESM1 AT and CESM2 AT, despite a better temperature and salinity year-round climatology, have a positive MLD bias in the Labrador Sea during the convective season that extends to the east of Greenland. The bias to the east of Greenland is common to the AT runs with atmospheric forcing from the two GFDL versions and MPI-ESM-LR. In the Labrador Sea proper, the MPI-ESM-LR AT has the best representation of convective mixing, followed by GFDL-ESM2M AT. The remaining models continue to underestimate deep mixing in winter in the central portion of the basin.

The AT biases in MLD can be explained considering the ESMs representation of the surface heat fluxes in the central Labrador Sea and to the east of Greenland in winter and spring, shown in Fig. 9. The atmospheric cooling over the Labrador Sea is too strong in CESM. This bias likely results from the model having the warmest ocean (and more so in the CESM2 version), and consequently having the greatest heat loss to the atmosphere. When this heat flux is applied to an ocean model with unbiased stratification, it results in a deeper MLD.

Fig. 9
figure 9

January-to-May climatology of the atmospheric heat fluxes into the ocean in ERA-Interim and in the ESMs over the 1985–2005 period

On the other hand, the atmospheric cooling in winter over the LS is extremely weak in GFDL-ESM2G, CanESM5 and IPS-CM5A-LR and weaker than in ERA-Interim in GFDL-ESM2M, and always too strong in the eastern portion of the subpolar gyre. These four models have an atmospheric and land mask resolution coarser than 2° and cannot resolve the topographic flow distortion that is key to the localization of the air-sea turbulent heat flux maximum over the LS (Moore et al. 2014).

The heat flux bias is not limited, in most models, to the convective season. Figure 10 shows the seasonal cycle of the atmospheric heat fluxes over the central Labrador Sea (50°–60° W, 55°–62.5° N) and in the eastern portion of our domain (30°–40° W, 55°–62.5° N). The linear trend in each dataset has been removed before calculating the annual cycle and is indicated in the legend of the figure.

Fig. 10
figure 10

Seasonal cycle (after linear detrending) of the heat fluxes into the ocean over the central Labrador Sea (box: 50°–60° W, 55°–62.5° N) and to the east of Greenland (box: 30°–40°W, 55°–62.5°N) for ERA-Interim and the ESMs. Linear trends are indicated in legend

Over the Labrador Sea, the CESM models are too cold in winter and spring, but partially compensate in the yearly climatology by being too warm in summer and fall. Their warming trend, though, is more than twice than that of ERA-Interim. The CanESM5 and GFDL-ESM2G have weak heat fluxes year-round, with maximum warming and cooling of the ocean too late (August) and too early (October–December). They also display a cooling trend in the 20 years considered, which is opposite to observed. The three remaining models underestimate the seasonal cycling, but to a lesser degree. Their linear trends are positive, even if larger than observed but for IPSL-CM5A-LR. Over the eastern side of Greenland all models have a more accurate representation of the atmospheric heating and cooling, but tend to be too cold in January and February, often with a warming trend much stronger than observed. In terms of trend, the two CESM realizations provide the representation closest to ERA-Interim. Overall, there is larger uncertainty in the model representation of the LS circulation, and better agreement, among ESMs and with the observations, over the Irminger Sea.

3.2.2 The OC simulations

In the OC runs the boundary and initial conditions are from the ocean components of the ESMs and the atmospheric forcing is replaced by ERA-Interim. The ocean model resolution is improved to 15 km, and the physical component is CROCO in all cases. Figure 11 shows the biases of the OC simulations in the upper 700 m for temperature and salinity, and Fig. 12 focuses on the density representation across the LS transect shown in Fig. 1.

Fig. 11
figure 11

As in Fig. 6 but for the OC integrations

Fig. 12
figure 12

Scatter plot of mean stratification (quantified by the Brunt–Väisälä frequency) at 75 m (circles for OC runs and diamonds for AT runs) and 700 m depth (stars for OC and squares for AT, not shown in legend for clarity) versus mixed layer depth in the central Labrador Sea (box 50°–60° W, 55°–62.5° N). a year-around; b convective season January–May only. Best least-square fits are also indicated. B1, B2, C1 and C2 are constants

The upper 700 m of the water column in all OC cases is warmer than in the original configuration (Fig. 11; to be compared to Fig. 5). The temperature bias at the ocean surface is always positive with respect to SPG15: the OC simulations are too warm. The resulting surface density bias shown in Suppl. Mat. Fig. 5 is negative even without the contribution from the freshwater fluxes from the GrIS by construction of the runs. Below the surface, on the other hand, the warm temperatures drive the stratification bias in most of the models, CESM1 OC and IPSL-CM5A-LR OC being the exceptions (see Suppl. Mat. Fig. 5 for the Brunt–Väisälä frequency plot across the transect indicated in Fig. 1). However, it is not possible, by the design of these runs, to fully separate the role of the intrinsic biases of each ocean model versus the contribution of stratification alone.

The CESM1 OC is too warm and salty in the upper 700 m of the water column, yet it overshoots the MLD compared to the SPG15 case both in the southern portion of the basin and in the central Labrador Sea. This is because it is not sea-ice covered anymore in winter and the density structure across the water column is too homogeneous and continuously maintained through the open boundaries. The MLD bias is strongly positive in the Labrador Sea also in the IPSL-CM5A-LM OC and CanEMS5 OC runs, due to the increase in winter cooling when correcting the ESM heat fluxes, and therefore convective mixing. In CanEMS5 OC the summer warming also increases, due to the more intense (positive) heat fluxes into the ocean in the warm months. The GFDL-ESM runs significantly underestimate the atmospheric warming (Fig. 10) and when their water masses are forced by ERA-Interim in CROCO their stratification becomes too strong (Suppl. Figure 5), and their MLD in turn too shallow compared to SPG15. The correction of the atmospheric fluxes causes a stronger winter mixing in MPI-ESM-LR OC but also a warming of the water column, especially in summer.

Results so far are summarized in Fig. 12, which shows the relationship between stratification, measured by the Brunt–Väisälä frequency at 75 m and 700 m of depth and the MLD in the central LS during the convective season and year-around in all the hybrid SPG runs. All runs but CESM2 AT are too stratified near the surface year-around compared to SPG15, and the OC simulations more so that the AT ones, as to be expected. The same problem can be seen at 700 m, but to a lesser extent. Convection depends strongly on stratification, and the surface bias persists in winter as well (b). Both year-around and convective season Brunt–Väisälä frequency at 700 m depth are great indicators of convective strength in the Labrador Sea with an exponential relationship connecting the two.

3.3 Variability in AT and OC runs

Figures 13 and 14 visualize the variability of deep convection in the AT and OC integrations by displaying the convective regions (Fig. 13) and the evolution of the convective volume in the 21 years considered (Fig. 14). Atmospheric and oceanic boundary conditions in the ESMs do not reproduce the phases of natural climate variability, such as the observed temporal evolution of the North Atlantic Oscillation. In the AT runs, the atmospheric forcing deviates from the observed temporal variability but the SODA boundary conditions indirectly contain information about the atmospheric/oceanic natural variability through the data assimilation process. Similarly, oceanic boundary conditions in the OC runs do not follow the observed natural variability, but the model is still informed by the observed atmospheric forcing through the reanalysis heat and momentum fluxes. Comparing AT and OC runs against the high resolution hindcast of SPG15 can help identify the main factors that regulate the variability of deep convection.

Fig. 13
figure 13

Contours surrounding the convective areas where the MLD is deeper than 1000 m for the OC and AT integrations with forcing fields from (a) CESM1-BGC and CESM2, (b) GFDL-ESM2G and GFDL-ESM2M, and (c) CanEMS5, IPSL-CM5A-LR and MPI-ESM-LR

Fig. 14
figure 14

Time evolution of the yearly maximum convective volume recorded for each of the 21 years considered. In the legend, numbers indicate the correlation coefficient between each time-series and the SPG15 one

CESM convects in the correct location but too much in its two versions and in both the AT and OC runs. The excessive convective volume is due, in the OC integrations, to the stratification of its water column, which is too weak compared to the observed, and in the AT runs, to the intense, stronger than observed, atmospheric cooling in most of the winters considered. Both versions of the CESM OC runs do not reproduce the relatively weak convective activity during the mid 1990s, despite the heat fluxes being from the reanalysis and this period being characterized by relatively mild winters. However, the AT runs of both versions capture this feature. This behavior suggests that for this ESM the oceanic conditions around the LS and the background stratification in the LS basin are key to reproduce the correct response to the atmospheric fluxes and may dominate the interannual variability signal. In the observational record Petit et al. (2021) found this to be the case for the formation of subpolar mode water in the Iceland basin, while an observational and model analysis jointly indicate that atmospheric fluxes have played the leading role in the LS over the 20 years considered (Yashayaev and Loder 2009; Luo et al. 2012), with the stratification being only mildly affected by changes in the Irminger Current. Indeed, the correlation coefficients of the time series of maximum convective volume in the CESM hybrid runs and in the SPG15 case increase from 0.36, and 0.32 for CESM1 OC and CESM2 OC, to 0.50 and 0.41 for CESM1 AT and CESM2 AT, respectively.

For GFDL, most of the convective activity occurs outside the LS, and this impacts the representation of interannual variability. The ESM2G version always underestimates convective mixing in the central Labrador Sea. The lack of convection is due in the AT runs to the bias in the heat fluxes, which strongly underestimate cooling in winter in the LS. In the OC case the heat flux bias is corrected, but the upper ocean is too stratified in the LS and convective mixing cannot take place. Both stratification and heat flux biases in the LS are reduced in the ESM2M version, therefore the better representation of the overall volume of convection in both AT and OC cases, with the time-average volume of convective mixing closer to that found in CROCO. The OC run, however, continues to place deep convection exclusively to the east of Greenland, where the largest cooling occurs.

For the last three models, all the OC runs and the MPI-ESM-LR AT case have convective mixing in the Labrador Sea, confirming that for many simulations the lack of cooling over the basin is the main bias limiting their performance. CanESM5 has variability and amount of convective mixing close to CROCO in the OC case. MPI-ESM-LR OC and IPSL-CM5A OC, on the other hand, overestimate mixing, due to the contribution of the east of Greenland portion. MPI-ESM-LR AT has a good representation of both convective amount and overall interannual variability.

Both CanESM5 OC and IPSL-CM5A OC achieve a substantially better representation of the interannual variability than their AT versions, despite the overly stratified water column in the central LS in both ESMs. The behavior is opposite to MPI-ESM or CESM, and suggests that stratification at the boundaries of the subpolar North Atlantic may also play an important role.

4 Discussion and conclusions

The subpolar North Atlantic is a region where ESMs have significant biases (Weaver et al. 2012; Cheng et al. 2013; Weijer et al. 2020). At the same time, the representation of AMOC is a key driver of inter-model uncertainty in the response of the whole climate system to global warming (Tagklis et al. 2020b; Bellomo et al. 2021). The Labrador Sea has the longest observational record of winter convective mixing, with at least one transect across the convective region per year since the 1980s but for 1989 (Yashayaev 2007). While the LSW may not be a proxy for AMOC intensity or its overall variability (Lozier et al. 2019), its formation and changes over time remain a critical target for validating models. It is generally believed that the resolution of the ocean component limits their performances, especially in the LS, where baroclinic instability is responsible for an energetic eddy field which is not resolved at current global resolutions that pre-condition the basin stratification (Lilly et al. 2003; Bracco et al. 2008). Tagklis et al. (2020a) showed that resolving the eddy field modulates the amplitude of deep convection with a linear relationship between resolution (or basin average absolute surface relative vorticity) and convective volume. The existence of such relation offers the possibility to locally parameterize convection in models with coarse resolution, and to account for the underestimation of mesoscale variability. ESMs, however, have biases beyond the ocean resolution problem. Here we investigated the relative role of the oceanic and atmospheric biases in the subpolar North Atlantic gyre in 7 ESMs chosen among the CMIP5/6 catalog. Our objective was to explore commonalities and differences, and possibly suggest a pathway forward to improve the representation of this region which is fundamental for climate variability and change.

We summarize our findings as follows:

  • The ESMs analyzed present distinct behaviors, without any obvious grouping, but they are all but CESM2 too stratified in the Labrador Sea at least in the upper portion of the water column, and too warm (warmer than observed) below 500 m depth. The stratification bias prevents deep convection from occurring in the central Labrador Sea as observed.

  • Most models (all but CESM) underestimate winter cooling and summer warming over the Labrador Sea, while all ESM considered represent reasonably in all seasons the heat fluxes to the east of Greenland. As a result, several models tend to convect in the Irminger Sea, especially whenever their ocean stratification is corrected. CESM1 and CESM2, on the other hand, have a warm ocean bias and overestimate the winter surface heat loss in the LS basin, which contributes to its excessive winter sea-ice coverage. The three models with the greatest underestimation in the LS winter heat loss and convection—CanESM, IPSL-CM5A-LR and GFDL-ESM2G—are among those with the lowest resolution in the atmospheric component.

  • The AT runs, through the correction of the oceanic boundary conditions, display large improvements in the salinity field throughout the water column, but continue to have significant biases in temperature. For models with a coarser than 2° resolution atmosphere component, the representation of the surface heat fluxes changes the ocean stratification and MLD in the western North Atlantic very rapidly, and the atmospheric bias must be corrected to ensure a closer representation of the LSW contribution to AMOC (see Fig. 7).

  • In the OC runs, both salinity and temperature fields remain significantly biased in most models even though observed heat and E-P fluxes are used. This is directly caused by the ESMs’ temperature and salinity fields, introduced to the regional ocean model domain via the oceanic boundary conditions. In addition, while the surface of the LS is too fresh in most of the ESMs (Fig. 6), in response to excessive GrIS melting, the OS simulations overcorrect for this bias by not considering the GrIS contribution at all. This causes an excess salinity in the water column in all cases which, however, would induce only a limited change (a few percent, see Tagklis et al. 2020a) in the amount of LSW formation if the melting was comparable to the observed.

Through this exercise, we have shown that improving the ocean model resolution, through regionally focused efforts, for example with climate models capable of zooming in areas of large mesoscale variability such as the LS, will not solve the major biases in the ESMs in the North Atlantic. This is because the representation of the dynamics is overwhelmed by biases in the transport of different water masses in the region, as shown by the OS outputs. Increasing resolution globally or anyway in the whole Atlantic basin may help (e.g. Marzocchi et al. 2015), but regionally zoomed-in experiments will not suffice. The atmospheric heat flux bias in ESMs with resolution coarser than 2° in the atmosphere/land components is likely to improve with better resolution in landmasses and topographic flow distortion (Moore et al. 2014), but the greatest priority remains to improve the representation of ocean stratification. As noted already for the Southern Ocean (Bourgeois et al. 2022), also in the North Atlantic Ocean the representation of stratification is linked to the simulated winter mixed layer depths and is key in determining the amount of heat, carbon and oxygen that the ocean absorbs under current and future climate. Improving globally the representation of eddy-induced diffusion may help in bringing the simulated stratification closer to the observed (Kuhlbrodt and Gregory 2012), but from our investigation, the problem appears to be tightly coupled across both ocean and atmosphere (and likely the cryosphere). Improved resolution globally, in both the ocean and the atmosphere, is needed, but it may not be enough and is not immediately achievable. Given the urgency of the problem, alternative routes to bias-correct the ocean stratification field should be explored, and we plan to use physically constrained machine learning techniques to this end in the near future (e.g. Falasca and Bracco 2022; Reinbold et al. 2021).