1 Introduction

The article by Svalgaard and Schatten (2016) contains a new sunspot-group number composite. The method used to compile this data series involves combining data from available observers into segments that the authors call “backbones”, which are then joined together by linear regressions. We here call this the “backbone” sunspot-group number [\(R_{\mathrm{BB}}\)] to distinguish it from other estimates of the sunspot-group number. What is different about the \(R_{\mathrm{BB}}\) composite is that instead of the recent grand maximum being the first since the Maunder minimum (circa 1650 – 1710), as it is in other sunspot data series, it is the third; there being one maximum of approximately the same magnitude in each century since the Maunder minimum. In itself, this is not such a fundamental change as it arises only from early values of \(R_{\mathrm{BB}}\) being somewhat higher than for the previous sunspot number or sunspot-group number records. However, the new series does suggest a flipping between two states rather than a more sustained rise from the Maunder minimum to the recent grand maximum, with implications for solar-dynamo theory and for reconstructed parameters, such as total and UV solar irradiances. We note the \(R_{\mathrm{BB}}\) data composite is quite similar to the second version of the composite of Wolf/Zürich/International sunspot number [\(R_{\text{ISNv2}}\)], recently generated by the Solar Influences Data Centre (SIDC) of the Solar Physics Research department of the Royal Observatory of Belgium. Specifically, annual \(R_{\text{ISNv2}}\) values also show three longer-term maxima since the Maunder minimum that are more equal in magnitude than for earlier series such as \(R_{\mathrm{G}}\), \(R_{\mathrm{C}}\), and RISNv1, but not as equal as they are for \(R_{\mathrm{BB}}\) and, unlike \(R_{\mathrm{BB}}\), the most recent of those three maxima remains the largest in \(R_{\text{ISNv2}}\). \(R_{\mathrm{BB}}\) is extreme in that it is the only sequence for which the highest solar-cycle means are not for Cycle 19 (uniquely, being larger for Solar Cycles 0 and 3 than for Cycle 19), whereas the highest annual value is during Cycle 19 for all data series, including \(R_{\mathrm{BB}}\).

The standard approach to calibrating historic sunspot data is “daisy-chaining”, whereby the calibration is passed from one data series (be it a backbone or the data from an individual observer) to an adjacent one, usually using linear regression over a period of overlap between the two. Svalgaard and Schatten (2016) claim that daisy-chaining was not used in compiling \(R_{\mathrm{BB}}\), positing that the backbone method is an alternative method to daisy-chaining rather than a different form of it. However, avoiding daisy-chaining requires deployment of a method to calibrate early sunspot data, relative to modern data, without comparing both to data taken in the interim: because no such method is presented in the description of the compilation of \(R_{\mathrm{BB}}\), it is evident that daisy-chaining was employed. Another new sunspot-group number data series has recently been published by Usoskin et al. (2016): these authors describe and employ a method that genuinely does avoid daisy-chaining because all data are calibrated by direct comparison with a single reference data set, independent of the calibration of any other data.

As discussed in Article 3 (Lockwood et al. 2016b), there are major concerns about the use of daisy-chaining. Firstly, rigorous testing of all regressions used is essential, and Lockwood et al. (2016b) show that the assumptions about linearity and proportionality of data series made by Svalgaard and Schatten (2016) when compiling \(R_{\mathrm{BB}}\) cause both random and systematic errors. The use of daisy-chaining means that these errors accumulate over the duration of the data series. Another problem has been addressed by Usoskin et al. (2016) and Willis, Wild, and Warburton (2016), namely that the day-to-day variability of sunspot-group data make it vital only to compare data from two observers that were taken on the same day. Hence the use of annual means by Svalgaard and Schatten (2016) is another potential source of error.

Other sunspot-data composites are also compiled using daisy-chaining, such as the original sunspot-group number [\(R_{\mathrm{G}}\)] generated by Hoyt, Schatten, and Nesme-Ribes (1994) and Hoyt and Schatten (1998); versions 1 and 2 of the composite of the Wolf/Zürich/International sunspot number [\(R_{\text{ISNv}1}\) and \(R_{\text{ISNv2}}\)], and the corrected \(R_{\text{ISNv}1}\) series [\(R_{\mathrm{C}}\)], proposed by Lockwood, Owens, and Barnard (2014a, 2014b). Some of these series also employ linear regressions of annual data. Hence these data series, like \(R_{\mathrm{BB}}\), have not been compiled with the optimum and most rigorous procedures and so also require critical evaluation. We note that, as for \(R_{\mathrm{BB}}\), there are specific concerns about \(R_{\mathrm{G}}\) and \(R_{\mathrm{C}}\) that have been expressed in the literature and which, as for \(R_{\mathrm{BB}}\), arise for the use of daisy-chaining. For example, Cliver and Ling (2016) find an error in the earliest RGO data, and the daisy-chaining construction of \(R_{\mathrm{G}}\) means that all values of \(R_{\mathrm{G}}\) before 1874 would be too low. Similarly, the intercalibration of the datasets of Schwabe and Wolf that was used in the construction \(R_{\mathrm{C}}\) has been questioned (e.g., Clette and Lefèvre 2016), and with daisy-chaining this would mean that all values of \(R_{\mathrm{C}}\) before 1850 would be too low.

These problems give the potential for calibration drifts and systematic errors, which means that uncertainties (relative to modern values) necessarily increase in magnitude as one goes back in time. By comparing with early ionospheric data, Article 1 (Lockwood et al. 2016a) finds evidence that such calibration drift is present in \(R_{\mathrm{BB}}\) as late as Solar Cycle 17, raising concerns that there are even larger drifts at earlier times.

It is undesirable to calibrate sunspot data using other correlated solar-terrestrial parameters because the regression may well vary due to a factor, or factors, that were not detected above the noise in the study that determined the regression. Such factors could introduce spurious long-term drift into the sunspot calibration. In addition, the independence of the two data series is lost in any such calibration, which takes away the validity of a variety of studies that assume (explicitly or implicitly) that the two datasets are independent. Article 1 (Lockwood et al. 2016a) discusses this point further and presents some examples. On the other hand, sunspots are useful primarily because they are proxy indicators of the correlated solar-terrestrial parameters and phenomena. Hence if the centennial-scale drift in any one sunspot number does not match that in a basket of solar-terrestrial activity indicators, this would mean that either i) there is calibration drift in the sunspot-number data or ii) sunspot numbers are not a good metric of solar-terrestrial influence on centennial timescales. From the above, we do not advocate using ionospheric, geomagnetic, auroral, and cosmogenic isotope data to calibrate sunspot data but note that a sequence is most successful, as a way of parameterising and predicting the terrestrial parameters, if it does reproduce their long-term drift. In this article we study the consistency of four sunspot-number sequences with geomagnetic and auroral data. The sunspot data sequences used here are i) the original composite of Wolf/Zürich/International sunspot number generated by SIDC [\(R_{\text{ISNv}1}\)]; ii) the group sunspot number [\(R_{\mathrm{G}}\)] of Hoyt, Schatten, and Nesme-Ribes (1994) and Hoyt and Schatten (1998); iii) the new “backbone” sunspot-group number [\(R_{\mathrm{BB}}\)] proposed by Svalgaard and Schatten (2016); and iv) the “corrected” sunspot number [\(R_{\mathrm{C}}\)] proposed by Lockwood, Owens, and Barnard (2014a). Figure 1 shows annual means of these data: it also shows (in black) the variation of the new version of the composite of Wolf/Zürich/International sunspot number recently issued by SIDC [\(R_{\text{ISNv2}}\)], which uses some, but not all, of the re-calibrations of the original data that were derived to generate \(R_{\mathrm{BB}}\). We note that to aid comparison, \(R_{\mathrm{BB}}\) is here scaled by a constant factor of \(\alpha_{\mathrm{BB}}=12.6\), which makes the mean values of \(\alpha_{\mathrm{BB}}\) \(R_{\mathrm{BB}}\) and \(R_{\text{ISNv}1}\) (and hence by its definition \(R_{\mathrm{C}}\)) the same over the calibration interval of 1982 – 2012 that is used here. The designated factor of 0.6 is used in the case of \(R_{\text{ISNv2}}\).

Figure 1
figure 1

Sunspot number series used in this article. (Orange) the original SIDC composite of Wolf/Zürich/International sunspot number [\(R_{\text{ISNv}1}\)]; (blue) the “corrected” sunspot number [\(R_{\mathrm{C}}\)] proposed by Lockwood, Owens, and Barnard (2014a); (green) the sunspot-group number [\(R_{\mathrm{G}}\)]; (mauve) the new “backbone” sunspot-group number [\(R_{\mathrm{BB}}\)] proposed by Svalgaard and Schatten (2016), here multiplied by a normalising factor of \(\alpha_{\mathrm{BB}}=12.6\) that makes the averages of \(\alpha_{\mathrm{BB}} R_{\mathrm{BB}}\) and \(R_{\text{ISNv}1}\) (and hence \(R_{\mathrm{C}}\)) the same over the calibration interval adopted here (1982 – 2014); (black) the new (version 2) SIDC composite of Wolf/Zürich/International sunspot number [\(R_{\text{ISNv2}}\)], here multiplied by the designated 0.6 scaling factor. Background white and grey bands denote even and odd sunspot cycles (minimum to minimum), respectively, which are numbered near the top of the plot. The light-cyan band marks the Maunder minimum.

There are two major concerns in relation to the different behaviour of \(R_{\mathrm{BB}}\) evident in Figure 1. The first is the stability of the calibration of each backbone over the interval it covers, and the second is the regression fits used to daisy-chain the backbones. Even for very highly correlated data segments, the best-fit regression can depend on the regression procedure used (see Article 3; Lockwood et al. 2016b), and it is vital to ensure that the most appropriate procedure is employed (Ryan 2008). Options include median least squares, Bayesian least squares, minimum-distance estimation, non-linear fits, and the ordinary least squares (OLS) that was used to generate \(R_{\mathrm{BB}}\). Even the OLS fits can be carried out in different ways in that they can either minimise the sum of the squares of the verticals (appropriate when the \(x\)-parameter is fixed or of small uncertainty such that the dominant uncertainty is in the \(y\)-parameter) or they can minimise the sum of the squares of the perpendiculars (usually more appropriate when there are uncertainties of comparable magnitude in both \(x\) and \(y\)). It is very important to test that fits are robust and the data do not violate the assumptions of OLS least-squares fitting procedure: Q–Q plots can be used to check the residuals are normally distributed, the Cook-D leverage parameter can test for data points that are having undue influence on the overall fit, and the fit residuals can be checked to ensure they are “homoscedastic” (i.e. that the dependent variable exhibits similar variance across the range of values for the other variable). All of these can invalidate a fit because the data are violating one or more of the assumptions of the regression technique used (Lockwood et al. 2006). Any daisy-chaining used to generate a long-term sunspot number sequence is of particular concern because if the random fractional uncertainty of the ith intercalibration is \(\delta_{\mathrm{i}}\), then the total fractional uncertainty will be \((\Sigma^{\mathrm{n}}_{\mathrm{i}=1} \delta_{\mathrm{i}}^{2})^{1/2}\), where \(n\) is the number of intercalibrations (provided the uncertainties \(\delta_{i}\), are uncorrelated). Even more significantly, systematic fractional errors at each intercalibration \(\varepsilon_{\mathrm{i}}\) will lead to a total systematic fractional error of \(\Pi^{\mathrm{n}}_{\mathrm{i}=1} \varepsilon_{\mathrm{i}}\). Both will inevitably grow larger as one goes back in time. Hence considerable uncertainties and systemic deviations are both possible for the earliest data compared to the modern data for any sunspot-number sequence compiled by daisy-chaining. The ability for these uncertainties to become amplified as one goes back in time makes it vital to check that the regressions are not influenced by an inappropriate fit procedure. None of the compilers of daisy-chained data series have investigated these potential effects, for example by using a variety of regression procedures, and instead implicitly trust the one procedure that they adopt. In the absence of tests against other procedures, comparison with other solar-terrestrial parameters becomes important as a check that the daisy-chained calibrations have not led to a false drift in the sunspot calibration.

2 Analysis

In this article, we compare the long-term drifts inherent in sunspot-data series with indices derived from terrestrial measurements that have been devised to vary in a manner that is as close to linear as possible with sunspot numbers over a 30-year “training” interval of 1982 – 2012. Linearity between the test metric and sunspot number is important because non-linearity would generate a difference in their long-term trends, especially for periods such as the Dalton and Maunder minima when values are outside the range seen during the training interval. Because of the concerns about the compounding effect of uncertainties in daisy-chained regressions and the potential differences between the results of different regression techniques, we here try to avoid using regression in making this comparison. Where regression techniques have to be used, they are used only in the training interval and the coefficients derived are then applied uniformly to the whole interval (1845 – 2013), such that 1845 – 1982 forms a fully independent test period. A probability \(p\)-value for every combination of fitted values is quantified and used in uncertainty analysis.

Because we are interested in long-term drifts, we here average all data series over full solar cycles (from minimum for minimum), ensuring successive data points are fully independent. To normalise the data we then divided these means by the value for Cycle 19. This cycle was chosen because it is the largest in the series and because much of the interest in the new sunspot series [\(R_{\mathrm{BB}}\) and \(R_{\text{ISNv2}}\)] is in the relative sizes of the peaks in the secular variation and, in particular, the relationship of earlier peaks to Cycle 19. Figure 2 shows the results for \(R_{\text{ISNv}1}\), \(R_{\mathrm{C}}\), \(R_{\mathrm{G}}\), \(R_{\mathrm{BB}}\), and \(R_{\text{ISNv2}}\). It can be seen that as we move to earlier times, from Cycle 19 back to Cycle 14, \(R_{\text{ISNv}1}\) decreases most rapidly whereas \(R_{\mathrm{G}}\) and \(R_{\mathrm{BB}}\) decrease the least rapidly. It is noticeable that \(R_{\mathrm{G}}\) and \(R_{\mathrm{BB}}\) are both group numbers and so the definitions may have something to do with the difference in behaviour. This interval (Cycles 14 – 18) includes the Waldmeier discontinuity (see Articles 1 and 4), which influences both Wolf numbers and group numbers generated in Zürich, but not necessarily in the same way. It is an allowance for this discontinuity that gives the difference in behaviour between \(R_{\text{ISNv}1}\) and \(R_{\mathrm{C}}\). Moving to yet earlier times, the difference between \(R_{\mathrm{BB}}\) or \(R_{\text{ISNv2}}\) and the other estimates (with the exception of \(R_{\mathrm{G}}\)) remains roughly the same over Cycles 13, 12, and 11, but grows considerably over Cycle 10. Obviously, the choice of which solar cycle to normalise with will influence the appearance of Figure 2 and of subsequent corresponding figures. It has no effect on the waveform of the long-term variation of each series, however, which is what should be compared. Because of the normalisation (which is necessary to compare sunspot numbers and sunspot-group numbers), the values of \(\langle R\rangle _{\text{Cn}}/\langle R\rangle _{19}\) are not absolute values, and we are therefore not concerned about the differences between different series for any one solar cycle, but how those differences change with time.

Figure 2
figure 2

Example of the comparison method adopted in this article. Solar-cycle averages (minimum-to-minimum) of the various sunspot sequences are shown. To facilitate comparison, each has been normalised to its value for Solar Cycle 19 (i.e. \(\langle R\rangle _{\text{Cn}}/\langle R\rangle _{19}\) is shown, where \(\langle R\rangle _{\text{Cn}}\) is the mean of cycle number \(C_{\mathrm{n}}\)), which has the advantage that scale factors (such as the 0.6 for \(R_{\text{ISNv2}}\) and \(\alpha_{\mathrm{BB}}\) for \(R_{\mathrm{BB}}\)) cancel out. Background white and grey bands denote even- and odd-numbered sunspot cycles (minimum to minimum), respectively, which are numbered near the top of the plot.

In this article, we apply the same analysis as in Figure 2 to indices derived from terrestrial measurements that have been designed, or found, to vary monotonically, and as closely as possible to linearly, with sunspot numbers. This enables us to compare like-with-like when we assess the long-term variations. We used the IDV (Svalgaard and Cliver 2005) and IDV(1d) (Lockwood et al. 2013a, 2013b; 2014a, 2014b) geomagnetic indices. One application of these geomagnetic indices exploited here is an empirical, statistical property (one that varies with the phase of the solar cycle, therefore allowance must be made for that factor) (Lockwood, Owens, and Barnard 2014b). More satisfactory are comparisons that employ the open solar flux (OSF) reconstruction of Lockwood et al. (2014b) (derived from the combination of four different pairings of geomagnetic indices) using two different theoretical formulations of the physical OSF continuity equation to relate OSF to sunspot numbers. A recent graphic demonstration of why these reconstructions of sunspot numbers from geomagnetic activity are valid and valuable has been presented by Owens et al. (2016). These authors showed that both the statistical and theoretical relationships between the geomagnetic-activity indices and sunspot numbers mean that the sunspot numbers and the geomagnetic-activity indices, including both IDV and IDV(1d), give reconstructions of the near-Earth interplanetary magnetic field that are almost identical. Lastly, we look at the annual occurrence of low-latitude aurorae [\(N_{\mathrm{A}}\)] compiled by Legrand and Simon (1987). In this case we have no quantitative theoretical relationship to exploit, although we do have a good qualitative understanding (Lockwood and Barnard 2015), and simply compare the variations in the normalised averages of \(N_{\mathrm{A}}\) and sunspot numbers.

2.1 Tests Using the IDV(1d) and IDV Geomagnetic Indices

The IDV and IDV(1d) indices are both based on Bartels’ \(u\)-index (Bartels 1932), which employs the difference between successive daily values of the horizontal or vertical component of the geomagnetic field (whichever yields the higher value). There are differences in the construction of these two indices. IDV employs the hourly means (or spot values) that are closest to local midnight for the station in question and uses as many stations as are available (the number of which therefore declines as one goes back in time) (Svalgaard and Cliver 2005). The IDV(1d) index uses the \(u\)-values as defined by Bartels (i.e. the differences in daily means) from just one station at any one time. Only three specified and intercalibrated stations are used with allowance for the effect of the secular drift in their geomagnetic latitude on the \(u\)-values (Lockwood 2013; Lockwood, Owens, and Barnard 2014a, 2014b). The stations were selected to make the IDV(1d) composite as long and as homogeneous as possible, but with the minimum number of intercalibrations, and each gave the smallest root-mean-square deviation from the data from all other available sub-auroral stations. To cover the full time interval, three different stations are required, but the calibration of these is not done by daisy-chained regressions. Instead the values are all normalised to the Eskdalemuir station in the year 2000 using the results of a survey of the dependence of \(u\) on geomagnetic latitude along with paleomagnetic and empirical model predictions of the variation of each station’s geomagnetic latitude (Lockwood 2013). Eskdalemuir was chosen because it provided the most stable long-term data (giving the lowest fit residuals with the data from the other 49 available sub-auroral stations) and the year 2000 as a convenient and memorable date in modern times. Regressions are used to then check the intercalibrations, but were not used to derive them. Because it is homogeneous in its construction and does not depend on daisy-chained calibrations, we here show results for IDV(1d), but results were very similar indeed if IDV was used.

Lockwood, Owens, and Barnard (2014b) analysed the known correlations between the IDV and IDV(1d) geomagnetic indices and the square root of the sunspot number [\(R\)]. This arises from the approximate correlation between the near-Earth IMF, \(B\), and \(R^{1/2}\), when averaged over the solar cycle that was noted by Wang, Lean, and Sheeley (2005). Because the IDV and IDV(1d) indices depend primarily on \(B\) (Svalgaard and Cliver 2005; Lockwood 2013; Lockwood, Owens, and Barnard 2014b), correlations over a solar cycle with \(R^{n}\) (with \(n \approx 0.5\)) are also expected. This correlation is found in annual-mean data, but Lockwood, Owens, and Barnard (2014b) have shown that this relationship is more complex than it first appears because it depends on the phase of the solar cycle. A different manifestation of the same property was found by Owens et al. (2016), who showed that the best fit over the solar cycle is different from that on centennial timescales. Lockwood, Owens, and Barnard (2014b) showed that the scatter in the relationship between \(\mathit{IDV}(1\mathrm{d})^{1/n}\) and sunspot number (they used \(R_{\mathrm{C}}\)) is much larger than that in a plot of \(\mathit{IDV}(1\mathrm{d})^{1/n}\) against \(R_{\mathrm{C}} / F(\Phi)\), where \(F(\Phi)\) is the function derived numerically (see their Figure 2) and \(\Phi\) is the phase of the solar cycle (defined linearly from \(\Phi = 0\) and \(\Phi = 2\pi\) at successive minima in five-point running means of monthly sunspot numbers). From the linear regression coefficients [slope \(s\) and intercept \(c\)], an estimate of the sunspot number from IDV(1d), \(R_{\text{RIDV(1d)}}\), can be made using

$$ R_{\mathit{IDV}(1\mathrm{d})} = F(\Phi) [s\, \mathit{IDV}(\text{1d})^{1/n} + c]. $$
(1)

\(R_{\mathit{IDV}(1\mathrm{d})}\) has been designed to vary linearly with sunspot number, and so its long-term variation can be compared to that in sunspot number. Lockwood, Owens, and Barnard (2014b) used all of the data in the IDV(1d) data series (since 1845) to derive the required coefficients \(s\), \(c\), and \(n\) and the function \(F(\Phi)\). This is repeated in the present article, but using only a 30-year “training” interval of 1982 – 2012. The coefficients obtained are then applied to the whole IDV(1d) data sequence to derive \(R_{\mathit{IDV}(1\mathrm{d})}\). The “training” is the evaluation of \(s\), \(c\), and \(n\) and \(F(\Phi)\), and this is here done separately using the sunspot numbers \(R_{\mathrm{C}}\), \(R_{\text{ISNv2}}\), \(R_{\mathrm{G}}\), and \(R_{\mathrm{BB}}\) (note that \(R_{\mathrm{C}}\) and \(R_{\text{ISNv}1}\) are, by the definition of \(R_{\mathrm{C}}\), identical over the training interval used). Figure 3 corresponds to Figure 3 of Lockwood, Owens, and Barnard (2014b) but is for 1982 – 2012 only: panel (a) shows \(\mathit{IDV}(1\mathrm{d})^{1/n}\) against \(\alpha_{\mathrm{BB}}R_{\mathrm{BB}}/F(\Phi)\) for the training interval, with data points coloured by the phase of the solar cycle. This fit yields \(n = 0.69\), \(s = 0.9925\), and \(c = -3.9732\) for the same function \(F(\Phi)\) as shown in Figure 2 of Lockwood, Owens, and Barnard (2014b). (These values are very close to the values of \(n = 0.69\), \(s = 1.000\), and \(c = -3.966\) obtained by Lockwood, Owens, and Barnard (2014b) for \(1845 - 2012\) and using \(R_{\mathrm{C}}\)). We note that the method is here demonstrated using the new index \(R_{\mathrm{BB}}\), but almost identical plots are obtained using \(R_{\mathrm{C}}\), \(R_{\mathrm{G}}\), and \(R_{\text{ISNv2}}\). (Note that \(R_{\text{ISNv}1}\) is identical to \(R_{\mathrm{C}}\) over the training interval). Figure 3b shows the values of \(R_{\mathit{IDV}(1\mathrm{d})}\) derived from IDV(1d) using Equation (1) with the above coefficients derived from Figure 3a. The black line shows the best-fit linear regression that minimises the mean square of the perpendicular deviations of the points from the line [\(\langle d_{\bot}^{2} \rangle\)]. Fits were made for the range of slopes \(s\) between 0.50 and 2.00 in steps of 0.01 and the range of intercepts \(c\) between −20 and \(+20\) in steps of 0.1. For each fit the, \(\langle d_{\bot}^{2} \rangle\) was evaluated and a probability \(p\)-value computed using Student’s \(t\)-test. The grey area shows the range of fits for which \(\langle d_{\bot}^{2} \rangle\) is larger than this minimum value by an amount smaller than the one-\(\sigma\) level. For each fit (of known \(p\)-value), a full sequence of \(R_{\mathit{IDV}(1\mathrm{d})}\) over the full interval of the IDV(1d) index (1845 – 2013) was generated.

Figure 3
figure 3

The derivation of the estimate of sunspot numbers from the IDV(1d) index, using the algorithm trained using \(R_{\mathrm{BB}}\) [\(R_{\mathit{IDV}(1\mathrm{d}),\text{BB}}\)]. (a) The scatter plot of annual \(\alpha_{\mathrm{BB}}R_{\mathrm{BB}}/F(\Phi)\) as a function of \(\mathit{IDV}(1\mathrm{d})^{1/n}\) for the training interval (1982 – 2012) using the best-fit \(n\) of 0.69 and \(F(\Phi)\), the function of solar-cycle phase \(\Phi\), used by Lockwood, Owens, and Barnard (2014b). Points are coloured by the phase of the solar cycle according to the colour scale at the top of the figure. From the slope \(s = 0.9925\) and intercept \(c = -3.9732\) of this plot, Equation (1) is used to compute \(R_{\mathit{IDV}(1\mathrm{d}),\text{BB}}\). (b) The normalised backbone sunspot number [\(\alpha_{\mathrm{BB}}R_{\mathrm{BB}}\)] as a function of \(R_{\mathit{IDV}(1\mathrm{d}),\text{BB}}\) for the same interval, with points again coloured by the phase of the solar cycle according to the colour scale at the top of the figure. The correlation coefficient between \(R_{\mathrm{BB}}\) and \(R_{\mathit{IDV}(1\mathrm{d}),\text{BB}}\) is \(r = 0.979\). The black line shows the best-fit linear regression that minimises the mean square of the perpendicular deviations of the points from the line [\(\langle d_{\bot}^{2} \rangle\)]. The grey area shows the range of fits for which \(\langle d_{\bot}^{2} \rangle \) is larger than this minimum value by an amount smaller than the one-\(\sigma\) level, as determined using the Student’s \(t\)-test.

The above procedure for training the algorithm to generate \(R_{\mathit{IDV}(1\mathrm{d})}\) using \(R_{\mathrm{BB}}\) over the interval 1982 – 2012 was repeated using \(R_{\text{ISNv2}}\), \(R_{\mathrm{G}}\), and \(R_{\mathrm{BB}}\) (not \(R_{\text{ISNv}1}\) because \(R_{\mathrm{C}}\) and \(R_{\text{ISNv}1}\) are the same over the training interval used). Figure 4 shows that the results are almost independent of the sunspot-number series used to train the algorithm and hence the variation depends almost exclusively on the IDV(1d) data and not on the training procedure. The difference between the various lines in Figure 4 is usually smaller than the plot line width, and the mauve line is the most visible because it was plotted last (and so is on top of the others). The correlation coefficients \([r]\) (and their significance levels [\(S\)] evaluated against the autoregressive AR1 red-noise model) are given in the first column of Table 1. The correlation using \(R_{\mathrm{C}}\) is very slightly lower than the others, and Figure 4 shows that \(R_{\mathit{IDV}(1\mathrm{d}),\text{C}}\) (the notation used means that this is sunspot number derived from IDV(1d) using \(R_{\mathrm{C}}\) during the training interval) tends to very slightly overestimate the values at each sunspot minimum. This is almost certainly due to the fact that \(R_{\mathrm{C}}\) has not been corrected for the effects of the recently revealed calibration drift in the Locarno sunspot data (Clette et al. 2016): After 1981, the international sunspot number was compiled using data from the Specola Solare Ticinese Observatory in Locarno, which took over from Zürich in 1981 as the main reference station. The calibration drift is between \(-15~\%\) and \(+15\%\) in modern-day \(R_{\text{ISNv}1}\)-values and has been corrected for in \(R_{\mathrm{BB}}\) and \(R_{\text{ISNv2}}\) but not in \(R_{\mathrm{C}}\) and is not relevant to \(R_{\mathrm{G}}\) (which is based on different data and only extends to 1979). The best evidence for this drift is the large number of observing stations that show the same variation relative to the Locarno station, but we note that Article 1 (Lockwood et al. 2016a) provides independent support for this correction as the ionosonde data agree better with \(R_{\mathrm{BB}}\) than \(R_{\mathrm{C}}\) over the training interval. It can be seen from Table 1 that the correlations are all high and comparable, and we assign the four sets of fits equal weight. The same procedure as was used by Lockwood, Owens, and Barnard (2014b) was then employed to combine these four sets of results into an optimum \(R_{\mathit{IDV}(1\mathrm{d})}\) reconstruction with two-\(\sigma\) uncertainties. Specifically, the \(p\)-value distributions were generated from each of the four estimates of \(R\) in any one year. From the correlation of the proxy used with \(R\), we can evaluate the \(p\)-value distribution of the \(R\)-values derived from an annual mean of that proxy. This was repeated for all four sunspot number estimates (\(R_{\mathrm{BB}}\), \(R_{\mathrm{C}}\), \(R_{\mathrm{G}}\), and \(R_{\text{ISNv2}}\)) and, making the simplest assumption that the resulting four \(p\)-value distributions are independent, this allows them to be combined into a single distribution by multiplying them together. The optimum value is the peak of this combined distribution and the error limits are taken to be the \(\pm2\sigma\) points. The \(\pm2\sigma\) uncertainty values determined this way delineate the grey area shown in Figure 4.

Figure 4
figure 4

Sunspot series derived from the IDV(1d) geomagnetic index using Equation (1) with the coefficients \(n\), \(s\), and \(c\) derived from the calibration interval using: \(R_{\mathrm{BB}}\) (gives \(R_{\mathit{IDV}(1\mathrm{d}),\text{BB}}\) shown by the mauve line); \(R_{\mathrm{G}}\) (gives \(R_{\mathit{IDV}(1\mathrm{d}),\text{G}}\) shown by the green line); \(R_{\mathrm{C}}\) (gives \(R_{\mathit{IDV}(1\mathrm{d}),\text{C}}\) shown by the blue line) and \(R_{\text{ISNv2}}\) (gives \(R_{\mathit{IDV}(1\mathrm{d}),\text{ISNv2}}\) shown by the black line). The grey band shows the \({\pm}\,2\sigma\) band for the combination of these four records and allows for the uncertainties in the fitted \(n\), \(s\), and \(c\) in each case. Note that for most years the series agree to within less than a line width, and so only the line plotted last (the mauve one) can be seen.

Table 1 Correlation coefficients [\(r\)] between sunspot numbers inferred from geomagnetic activity and the various sunspot-number sequences [\(R_{\text{ISNv}1}\), \(R_{\mathrm{C}}\), \(R_{\mathrm{G}}\), \(R_{\mathrm{BB}}\), and \(R_{\text{ISNv2}}\)] over the training period of 1982 – 2012. (a) \(R_{\mathit{IDV}(1\mathrm{d})}\), (b) \(R_{\text{OSF}1}\), and (c) \(R_{\text{OSF}2}\) are generated (a) from the IDV(1d) index, (b) using the Solanki, Schüssler, and Fligge (2000) OSF model with the geomagnetic OSF reconstruction of Lockwood, Owens, and Barnard (2014b), and (c) by the Owens and Lockwood (2012) OSF model using the same OSF reconstruction. Training of the algorithms employs the same sunspot-number sequence with which the \(R_{\mathit{IDV}(1\mathrm{d})}\), \(R_{\text{OSF}1}\), and \(R_{\text{OSF}2}\) sequences are then correlated. The significance level of each correlation evaluated against the AR1 red noise model [\(S\)], is given in brackets in each case. Note that \(R_{\mathrm{G}}\) is only available up to 1995 and the training period therefore is 1982 – 1995 (resulting in lower \(S\)-values) and that \(R_{\text{ISNv}1}\) and \(R_{\mathrm{C}}\) are identical over the training interval.

The black line in Figure 5a shows the cycle means of the optimum values of \(R_{\mathit{IDV}(1\mathrm{d})}\), normalised to the value for Cycle 19 as in Figure 2, as a function of the cycle number. The grey area is bounded by the same variations for the maximum \(R_{\mathit{IDV}(1\mathrm{d})}\) (at the \(+1\sigma\) level) and the minimum \(R_{\mathit{IDV}(1\mathrm{d})}\) (at the \(-1\sigma\) level). Also shown in Figure 5(a) are the corresponding variations for \(R_{\text{ISNv}1}\), \(R_{\mathrm{C}}\), \(R_{\mathrm{G}}\), and \(R_{\mathrm{BB}}\), as in Figure 2. A comparison of these variations is described and discussed below.

Figure 5
figure 5

Solar-cycle means (minimum-to-minimum) of various sunspot-number estimates [\(R\)] as a function of the cycle number, [\(C_{\mathrm{n}}\)] normalised to the value for Solar Cycle 19 [\(\langle R\rangle _{\text{Cn}}/ \langle R\rangle _{19}\)]. In each panel the orange, blue, green, and mauve lines are for \(R\) of \(R_{\text{ISNv}1}\), \(R_{\mathrm{C}}\), \(R_{\mathrm{G}}\), and \(R_{\mathrm{BB}}\), respectively. The black lines are the optimum (highest \(p\)-value) values of (a) \(R_{\mathit{IDV}(1\mathrm{d})}\), (b) \(R_{\text{OSF}1}\), (c) \(R_{\text{OSF}2}\) from a combination of four fits made using \(R_{\mathrm{C}}\), \(R_{\text{ISNv2}}\), \(R_{\mathrm{G}}\), and \(R_{\mathrm{BB}}\) over the training interval 1982 – 2012, and (d) \(N_{\mathrm{A}}\), the annual number of low-latitude aurorae in the catalogue of Legrand and Simon (1987). The grey areas indicate the variations for the optimum values \(\pm1\sigma\) uncertainties. (See text for details).

We need to present a very important caveat about this test. It is based on a purely empirical relationship between IDV(1d), sunspot number, and the phase of the solar cycle. The relationship appears to work well for the interval for which we have IDV(1d) data (1845 – present) and over which we here apply the test. However, because it is a purely empirical relationship, this does not mean that it would necessarily work well for other intervals (and the Maunder minimum in particular). The same is equally true for any application of the purely empirical relationships between the IMF \(B\) and \(R^{{n}}\) and the IDV index and \(R^{{n}}\). We note that the test presented here was also carried out using the IDV index (not shown), and the results were the same on all important points.

2.2 Test Using OSF Derived from Geomagnetic Indices and a Continuity Model

The open solar flux (OSF) [\(F_{\mathrm{S}}\)] is the total “signed” flux (i.e. we here define it as of one toward/away magnetic polarity) threading a nominal source surface in the solar corona. It was first reconstructed from historic geomagnetic activity data by Lockwood, Stamper, and Wild (1999). OSF is a much more satisfactory parameter on which to base a comparison with sunspot numbers because it is, like the sunspot number, a global property of the Sun rather than a local parameter specific to near-Earth space (such as the near-Earth IMF or near-Earth solar-wind speed or geomagnetic indices, including IDV(1d), which depend on these near-Earth interplanetary conditions). OSF has been reconstructed for 1845 – 2014 by Lockwood, Owens, and Barnard (2014b) using four pairings of geomagnetic indices: aaC and IDV, \(aa_{\mathrm{C}}\) and IDV(1d), IHV and IDV, and IHV and IDV(1d), where \(aa_{\mathrm{C}}\) is a version of the aa-index that has been corrected using the \(A\)p-index (and extended back to 1845 using comparable range data from the Helsinki observatory) (Lockwood, Owens, and Barnard 2014b) and IHV is the Inter-Hour Variability index introduced by Svalgaard, Cliver, and Le Sager (2004) and developed by Svalgaard and Cliver (2007). The IDV and IDV(1d) indices were discussed in the last section. We note that recently Holappa and Mursula (2015) have suggested that errors in the geomagnetic data make the Lockwood, Owens, and Barnard (2014b) reconstructions greatly in error. However, Lockwood, Owens, and Barnard (2016) point out that Holappa and Mursula introduced errors by calibration against unreliable data, which they then exacerbated by using a less sophisticated and robust reconstruction procedure than that used by Lockwood et al. To relate OSF to sunspot numbers, in this section we use the model of Solanki, Schüssler, and Fligge (2000), based on the continuity equation for OSF:

$$ \mathrm{d}F_{\mathrm{S}}/\mathrm{dt} = S - L, $$
(2)

where \(S\) is the global OSF source rate and \(L\) is its global loss rate. In the first application of this model by Solanki, Schüssler, and Fligge (2000), the loss rate was assumed to be linear so that \(L = F_{\mathrm{S}} /\tau\), where \(\tau\) is the loss time constant. From Equation (2),

$$ S_{\mathrm{S}} = \mathrm{d}F_{\mathrm{S}}/\mathrm{dt} + F_{\mathrm{S}} /\tau, $$
(3)

where \(S_{\mathrm{S}}\) is the OSF source term derived using the Solanki, Schüssler, and Fligge (2000) formulation of \(L\). \(S_{\mathrm{S}}\) can be estimated using Equation (3) for 1845 – 2014 using an OSF reconstruction from geomagnetic activity. We here use the most accurate and robust reconstruction, which is by Lockwood et al. (2014b). In the Solanki et al. formulation, the OSF production rate [\(S_{\mathrm{S}}\)] is related to sunspot number [\(R\)] by

$$ S_{\mathrm{S}}/{c} = (1+A_{\mathrm{f}} /A_{\mathrm{s}})R = 22R + 24.35 - 0.061R^{2}, $$
(4)

where \(A_{\mathrm{f}}\) and \(A_{\mathrm{s}}\) are the areas of faculae and sunspots on the solar surface, the ratio of which is given by a polynomial in \(R\) (their Equation 3), which is incorporated into Equation (4) above. The constants \(\tau\) and \(c\) are here evaluated using each of the sunspot number series for the training period (1982 – 2012) giving the time series \(S_{\mathrm{S}}/{c}\) from the OSF geomagnetic reconstruction. Solving the quadratic Equation (4) for each year gives a sunspot-number estimate based on the OSF reconstruction and the Solanki, Schüssler, and Fligge (2000) model [\(R_{\text{OSF}1}\)]. Figure 6a shows a scatter plot of \(R_{\text{OSF1},\text{BB}}\) against \(\alpha_{\mathrm{BB}}R_{\mathrm{BB}}\) in the same format as Figure 3. A \(p\)-value for each combination of \(\tau\) and \(c\) is computed from the mean-square deviation, and the results from the four different training sunspot series [\(R_{\mathrm{C}}\), \(R_{\text{ISNv2}}\), \(R_{\mathrm{G}}\), and \(R_{\mathrm{BB}}\)] are then combined in the same way as for \(R_{\mathit{IDV}(1\mathrm{d})}\) in the previous section.

Figure 6
figure 6

Scatter plots of \(\alpha_{\mathrm{BB}}R_{\mathrm{BB}}\) as a function of (a) \(R_{\text{OSF}1,\text{BB}}\) and (b) \(R_{\text{OSF}2,\text{BB}}\), the estimates of \(R\) from the OSF reconstruction of Lockwood et al. (2014b) using OSF continuity and the OSF loss-rate formulations of Solanki, Schüssler, and Fligge (2000), and Owens and Lockwood (2012), respectively. The points are for the training interval and are coloured by the phase of the solar cycle according to the colour scale at the top of the figure. The black line shows the best-fit linear regression that minimises the mean square of the perpendicular deviations of the points from the line [\(\langle d_{\bot}^{2} \rangle\)]. The grey area shows the range of fits for which \(\langle d_{\bot}^{2} \rangle\) is larger that this minimum values by an amount smaller than the one-\(\sigma\) level, determined by the Student’s \(t\)-test.

The black line in Figure 5b gives the optimum value of normalised cycle averages of \(R_{\text{OSF}1}\) and the grey area around it the \(\pm1\sigma\) uncertainty band, in the same format and derived in the same way as for \(R_{\mathit{IDV}(1\mathrm{d})}\) in the previous section.

2.3 Test Using OSF Derived from Geomagnetic Indices and a Second Continuity Model

The correlation between \(R_{\mathrm{BB}}\) and \(R_{\text{OSF1},\text{BB}}\) [\(R_{\text{OSF}1}\) derived from the training using \(R_{\mathrm{BB}}\)] shown in Figure 6a is 0.907. The method employed means that a regression fit in Figure 6a is never used; however, it is not ideal that the scatter is not homoscedastic and larger at higher values. There is also a suggestion of some non-linearity in the dependence. Hence in this section we investigate a second version of the OSF continuity model by Owens and Lockwood (2012). This model is also based on the continuity Equation (2), but uses different formulations of the production and loss terms. A key element of this model is that the fractional loss rate is a function of the warping of the heliospheric current sheet and hence of the phase of the solar cycle, as predicted theoretically by Owens, Crooker, and Lockwood (2011),

$$ S_{\text{OL}} = \mathrm{d}F_{\mathrm{S}}/\mathrm{d}t + F_{\mathrm{S}} k_{1} I_{\text{HCS}} = \mathrm{d}F_{\mathrm{S}}/\mathrm{d}t + F_{\mathrm{S}} k_{2} f(\Phi), $$
(5)

where \(I_{\text{HCS}}\) is the current-sheet tilt index, the fraction of longitudinally adjacent pixels of opposite-field polarity on the coronal source surface (defined from magnetograph observations mapped up to the coronal source surface using the potential-field source surface method). \(I_{\text{HCS}}\) has regular variation with solar-cycle phase [\(\Phi\)] and \(f(\Phi)\) is the best-fit function to \(I_{\text{HCS}}\): \(k_{1}\) and \(k_{2}\) are constants. Owens and Lockwood (2012) showed that this loss rate gave good fits to reconstructed OSF for a simple linear relationship between \(S_{\text{OL}}\) and sunspot number, but optimum fits were obtained by Lockwood and Owens (2014) using the more complex form given by their Equation (8). This was originally based on the idea that much OSF emerges through the source surface as a result of CME eruptions and \(F\) was the estimated from the average OSF enhancement associated with each event. However, Wang, Lean, and Sheeley (2015) have recently pointed out that the rapid rise in OSF in the second half of 2014 does not appear to have been accompanied by a corresponding rise in CME occurrence (although the possibility of a smaller number of CMEs each causing unusually large emergence cannot be discounted). The requirement here is to equate the OSF emergence rate to sunspot numbers, and the 2014 rise in OSF did indeed follow a rise in sunspot numbers. A readily invertible equation for \(S_{\text{OL}}\) that gives higher correlations with observed annual OSF data (including 2014) is

$$ S_{\text{OL}} = F[0.234(R+2.67)^{0.540} - 0.00153], $$
(6)

where \(F = 2.1\times10^{1 4}\ \mbox{Wb}\). Inverting Equation (6) yields a sunspot-number estimate that we here call \(R_{\text{OSF}2}\). This is then processed in exactly the same way as was \(R_{\text{OSF}1}\) in the previous section. Figure 6b shows the scatter plot of \(R_{\text{OSF}2}\), derived using \(R_{\mathrm{BB}}\) for the training interval, as a function of \(R_{\mathrm{BB}}\), and Figure 5c shows the variation of cycle averages derived using all four independent sunspot data series (\(R_{\mathrm{C}}\), \(R_{\text{ISNv2}}\), \(R_{\mathrm{G}}\), and \(R_{\mathrm{BB}}\)) in the same way as was done for \(R_{\mathit{IDV}(1\mathrm{d})}\) and \(R_{\text{OSF}1}\). We note that the difference between the results using the different training sunspot number series is always smaller than the one-\(\sigma\) uncertainties that are set by the procedure used to extrapolate to times before the training interval. This is true for both \(R_{\text{OSF}1}\) and \(R_{\text{OSF}2}\) as well as for \(R_{\mathit{IDV}(1\mathrm{d})}\) (see Figure 4).

2.4 Tests using Occurrence Frequency of Low-Latitude Aurora

The last comparison made here is much simpler. The long-term variation of the occurrence frequency of low-latitude aurora has been studied by many authors using many sources (see reviews by Silverman 1992; Lockwood and Barnard 2015; and Vázquez et al. 2016). The number of auroral nights at low geomagnetic latitudes [\(N_{\mathrm{A}}\)] has long been known to vary with sunspot number, but the correlation in annual means is not high. This is largely because low-latitude aurorae can be generated after the Earth intersects both coronal mass ejections (CMEs) and co-rotating interaction regions (CIRs), and so \(N_{\mathrm{A}}\) peaks at sunspot maximum because of the effects of CMEs, but can be almost as high during the declining phase because of CIRs, despite the lower sunspot numbers. This solar-cycle variation in the relationship between \(N_{\mathrm{A}}\) and sunspot numbers lowers the correlation in annual means but is averaged out when solar-cycle means are taken. (Table 2, discussed below, shows that correlations for annual means are in the range 0.64 – 0.68, whereas for solar-cycle means they are 0.88 – 0.96). In addition, we have no quantitative theory to elucidate the connection between \(N_{\mathrm{A}}\) and sunspot number despite our good qualitative understanding of the link (see Lockwood and Barnard 2015). We here make use of the variation of \(N_{\mathrm{A}}\) from the auroral catalogue by Legrand and Simon (1987). Figure 4d shows the variation of solar-cycle means of \(N_{\mathrm{A}}\) (normalised to the value for Cycle 19) and compares them to corresponding variations for various sunspot-number sequences.

Table 2 Correlation coefficients [\(r\)] between low-latitude auroral activity, quantified by the number of auroral nights per year at geomagnetic latitudes below 55° [\(N_{\mathrm{A}}\)] and the various sunspot-number sequences [\(R_{\text{ISNv}1}\), \(R_{\mathrm{C}}\), \(R_{\mathrm{G}}\), \(R_{\mathrm{BB}}\), and \(R_{\text{ISNv2}}\)] over the whole interval of the auroral data (1770 – 1980). The significance level of each correlation evaluated against the AR1 noise model [\(S\)] is given in brackets. The columns are for different data subsets determined by the phase of the solar cycle [\(\Phi\)]: all the data [\(0 \leq \Phi < 2\pi\)], the half of the cycle around solar maximum [\(\pi/2 \leq \Phi < 3\pi/2\)], and the half of the cycle around solar minimum [\(\Phi < \pi/2\) or \(\Phi \geq 3\pi/2\)]. (a) is for annual means, (b) is for solar-cycle averages. Note that there are only 18 data points for the cycle means (panel b), which is too few to compute meaningful significance levels.

Figure 7 shows a scatter plot of annual values of \(N_{\mathrm{A}}\) against \(R_{\mathrm{C}}\) and reveals that although there is a general trend for \(N_{\mathrm{A}}\) to increase with \(R_{\mathrm{C}}\), there is considerable scatter. The first column of panel a of Table 2 gives the correlation coefficients (and their statistical significances) of the various sunspot-number series, evaluated over the entire interval of the \(N_{\mathrm{A}}\) record (1780 – 1980). For annual means (Table 2a) they are in the range 0.64 – 0.68, and the large scatter means that the significances are generally quite low. Certainly no significance can be attached to the differences between the various correlation coefficients. Figure 7 separates the two halves of the sunspot cycle by plotting points in red where the phase of the solar cycle [\(\Phi\)] of the mid-point of the year is in the range \(\pi/2 \leq \Phi < 3\pi/2\) (the half of the cycle containing the sunspot maximum, as \(\Phi = 0\) and \(\Phi = 2\pi\) are defined at successive minima in five-point running means of monthly sunspot numbers). The blue dots are all data points not meeting this criterion and so are in the half of the solar cycle that is centred on the sunspot minimum. It can be seen that the relationship between \(N_{\mathrm{A}}\) and \(R_{\mathrm{C}}\) depends on the phase of the solar cycle. The second and third columns of Table 2 show that dividing the data into these two phase bins does not, in general, significantly alter the correlation coefficients. We note that although the scatter in both the red and blue dots in Figure 7 is still large, there is no suggestion of a non-monotonic relationship, and it is therefore reasonable to compare the long-term variations of \(N_{\mathrm{A}}\) and sunspot numbers.

Figure 7
figure 7

Scatter plot of the corrected sunspot number [\(R_{\mathrm{C}}\)] as a function of the number of low-latitude auroral nights per year [\(N_{\mathrm{A}}\)]. Points are colour-coded according to the phase of the solar cycle [\(\Phi\)], with red dots for the halves of the cycle around solar maximum (\(\pi/2 \leq \Phi < 3\pi/2\)) and the blue dots for the halves of the cycle around solar minimum (\(\Phi < \pi/2\) or \(\Phi \geq 3\pi/2\)).

Figure 8 repeats the comparison of Figure 5d for these two halves of the solar cycle separately. We note that the cycle means of all the sunspot numbers have also been averaged over the half of the solar cycle around solar maximum and minimum in Figures 8a and 8b, respectively. Table 2b gives the associated correlation coefficients or the solar-cycle means: there are just 18 pairs of data points and the autocorrelation function at lag 1 is high (high data persistence) for both data series, therefore significance levels against the AR1 red-noise model cannot be computed.

Figure 8
figure 8

Solar cycle means (minimum-to-minimum) of various sunspot number estimates [\(R\)] as a function of the cycle number [\(C_{\mathrm{n}}\)] normalised to the value for Solar Cycle 19 \([\langle R\rangle _{\text{Cn}}/ \langle R\rangle _{19}]\). In each panel the orange, blue, green, and mauve lines are for \(R\) of \(R_{\text{ISNv}1}\), \(R_{\mathrm{C}}\), \(R_{\mathrm{G}}\), and \(R_{\mathrm{BB}}\), respectively This figure is the same as Figure 5d, comparing the normalised cycle means of the various sunspot-number sequences with those of the number of low-latitude auroral nights [\(N_{\mathrm{A}}\)] but (a) is for the halves of the cycle around solar maximum [\(\pi/2 \leq \Phi < 3\pi/2\)] and (b) for the halves of the cycle around solar minimum [\(\Phi < \pi/2\) or \(\Phi \geq 3\pi/2\)].

3 Discussion

Figure 5 shows that \(R_{\mathrm{BB}}\) becomes increasingly larger than the other sunspot-number estimates as one goes back in time. None of the series derived here from geomagnetic or auroral activity [\(R_{\mathit{IDV}(1\mathrm{d})}\), \(R_{\text{OSF}1}\), \(R_{\text{OSF}2}\), and \(N_{\mathrm{A}}\)] reproduce this behaviour. In each case, extrapolating back in time from the algorithm training period (1982 – 2012) gives a time-series that lies closest to the variations for \(R_{\mathrm{C}}\) and \(R_{\text{ISNv}1}\). In each case, \(R_{\mathrm{BB}}\) lies above the extrapolation in almost all years by an amount that exceeds the two-\(\sigma\) uncertainty (the grey bands). This trend is seen for all series back to the start of the geomagnetic-activity data in 1845 and is consistent with the findings for Cycle 17 in Article 1 (Lockwood et al. 2016a).

The scatter plots for the training interval indicate that the best proxy sunspot number, in terms of the correlation coefficient, is \(R_{\mathit{IDV}(1\mathrm{d})}\). However, this is a purely empirical relationship. It is useful to compare with the results from the proxies \(R_{\text{OSF}1}\) and \(R_{\text{OSF}2}\), which are based on the physical continuity equation for OSF. \(R_{\text{OSF}1}\) and \(R_{\text{OSF}2}\), like \(R_{\mathit{IDV}(1\mathrm{d})}\), both depend on empirical fit parameters, but the use of the continuity equations means that the fits are more constrained than is the case for \(R_{\mathit{IDV}(1\mathrm{d})}\). In addition, OSF is more satisfactory because it is a global solar parameter, like the sunspot number, whereas IDV(1d), and hence \(R_{\mathit{IDV}(1\mathrm{d})}\), are local parameters related to the near-Earth heliosphere.

In addition, whereas using \(R_{\mathit{IDV}(1\mathrm{d})}\) means that we have to assume that the IDV(1d) geomagnetic index depends only on the simultaneous sunspot number, \(R_{\text{OSF}1}\) and \(R_{\text{OSF}2}\) both allow for the effect of persistence in the data series (see Lockwood et al. 2011; Lockwood 2013), whereby the current value also depends upon recent history, to a degree that is defined by the best-fit parameters. For the training period the correlation of all sunspot numbers with \(R_{\text{OSF}1}\) is consistently slightly lower than with \(R_{\text{OSF}2}\) (Table 1) and \(R_{\text{OSF}2}\) reveals lower scatter and heteroscedasticity (shown in Figure 6b for the comparison of \(R_{\text{OSF}2,\text{BB}}\) with \(R_{\mathrm{BB}}\), but this is also true for all other series tested). Hence \(R_{\text{OSF}2}\) provides the most satisfactory test, which is shown in Figure 5c. We note that the training procedures for \(R_{\text{OSF}2}\), \(R_{\text{OSF}1}\), and \(R_{\mathit{IDV}(1\mathrm{d})}\) all employed four sunspot number series [\(R_{\mathrm{BB}}\), \(R_{\mathrm{C}}\), \(R_{\mathrm{G}}\), and \(R_{\text{ISNv2}}\)] that give almost identical variations. All are here given equal statistical weight.

The auroral data show that the same tendency extends back to 1780, which means that it covers the Dalton minimum (around Solar Cycle 6) and before. Dividing these data by solar cycle phase reveals an interesting feature of the data (Figure 8): for both the solar-maximum and solar-minimum data the long-term variation in \(N_{\mathrm{A}}\) is closer to those in \(R_{\mathrm{C}}\) and \(R_{\text{ISNv}1}\), while \(R_{\mathrm{BB}}\) is consistently larger. It is noticeable that the variations for sunspot-minimum and sunspot maximum have similar forms for \(R_{\mathrm{C}}\), \(R_{\text{ISNv}1}\), \(R_{\mathrm{G}}\), and \(N_{\mathrm{A}}\). However, \(R_{\mathrm{BB}}\) is different. For cycles before the Dalton minimum (Solar Cycles 5 and before) the sunspot minimum values exceed those seen in modern times (the normalised cycle-average values frequently exceed unity, whereas the same cycles are giving values near unity for solar maximum). Thus the drift to higher values in \(R_{\mathrm{BB}}\) is greater in the solar-minimum values than it is in the solar-maximum values. This implies that the cause of the drift in \(R_{\mathrm{BB}}\) is more than the effect of the calibration observer \(k\)-factors as they would influence the values around solar minimum and around solar maximum to the same fractional extent.

We note that, self-evidently, if we normalised using another cycle, then all values would be the same for that cycle and values for Cycle 19 would be different. But we recall that the point of Figure 5 is to evaluate for each tested and test data series how earlier solar cycles compare in amplitude with Cycle 19, i.e. to study the ratio \(\langle R\rangle_{\text{Cn}} /\langle R\rangle_{19}\), as is shown.

The consistency with which the geomagnetic and auroral data give lower values (normalised to modern values) than \(R_{\mathrm{BB}}\) and the way that the difference grows as one goes back in time strongly suggests that there may be calibration drift in the values of \(R_{\mathrm{BB}}\). In particular, this calls for a check on the compilation of \(R_{\mathrm{BB}}\). This could be done by repeating it with different regression procedures because the necessary daisy-chaining of calibrations means that both systematic and random errors will be amplified as one goes back in time. Article 3 is this series (Lockwood et al. 2016b) shows that the inflation of \(R_{\mathrm{BB}}\) as one goes back in time is consistent with the effect of regressions and the assumptions made by Svalgaard and Schatten (2016), in particular that the sunspot group counts by different observers are proportional. This assumption of proportionality was initially made by Wolf (1861) when he devised sunspot numbers because he envisaged the \(k\)-factors as being a constant for each combination of observer and observing instrument. However, in 1872 he realised that this was an invalid assumption (Wolf 1873), and thereafter observer \(k\)-factors were computed either quarterly or annually (using daily data) at the Zürich observatory: Wolf also re-calculated all prior calibrations the same way (see Friedli 2016). It is also important to recognise that the common practice of taking ratios of different sunspot numbers or sunspot-group numbers either to make or to test calibrations of sunspot observers inherently assumes proportionality and will also give misleading values.