1 Introduction

A Standard Model (SM)-like Higgs boson with mass of \(\sim 125\) GeV has been discovered by the ATLAS and CMS experiments at the Large Hadron Collider (LHC) [1, 2]. Recently, the decay mode of the Higgs boson to bottom quarks \(H \rightarrow b\overline{b}\) has been observed at the LHC [3, 4], as well as the \(t\overline{t}H\) production process [5, 6], both being consistent with the SM prediction. However, there are several important questions to which the SM does not offer an answer: it neither explains the hierarchy problem, nor does it address the nature of dark matter, the origin of cosmic inflation, or the baryon-antibaryon asymmetry in the universe. Most fundamentally, it does not include gravity. The Higgs boson could be a portal towards the solution of many of the outstanding questions which are not addressed by the SM. Thus, it is very important to measure the Higgs boson in as many channels as possible. In the SM, the Yukawa coupling between matter fermions and the Higgs boson is proportional to the fermion’s mass. If any deviation from this proportionality is observed, it is an indication of new physics beyond the SM. The size of the deviation from the SM depends on the model, but for a large variety of models, it is estimated to be at the level of a few percent [7]. To observe such a small deviation, very precise measurements of the properties of the Higgs boson are required. The International Linear Collider (ILC) [8,9,10,11,12,13] is one of the future \(e^{+} e^{-}\) colliders proposed to deliver this precision with the least possible dependency on models. The interaction of electrons and positrons will provide a cleaner environment than the proton-proton collisions at the LHC. Besides, the beams would be longitudinally polarised: \(80{\%}\) for the electron beam and \(30{\%}\) for the positron one. By controlling the polarisation, one can study the chiral structure of the SM interactions and potentially new physics [14].

In this paper, we focus on the channel of the Higgs boson decays to a pair of muons \(H \rightarrow \mu ^{+}\mu ^{-}\) at the ILC. This channel is important because it provides an opportunity to measure the Yukawa coupling between the Higgs boson and a second-generation fermion directly. However, this is a very challenging analysis, because in the SM the branching fraction of \(H \rightarrow \mu ^{+} \mu ^{-}\) is predicted to be tiny: \(2.2 \times 10^{-4}\) for the mass of the Higgs boson of 125 GeV [15].

At the LHC, the \(H \rightarrow \mu ^{+} \mu ^{-}\) channel is explored using proton-proton collisions. In ATLAS, an observed significance on \(\mu = \sigma _{\mathrm {obs}} / \sigma _{\mathrm {SM}}\) of \(2.0 \sigma \) with respect to the hypothesis of no \(H \rightarrow \mu ^{+} \mu ^{-}\) signal was obtained using the full Run 2 dataset of 139 fb\(^{-1}\) [16]. The CMS observed an excess of events in data with a significance of \(3.0 \sigma \) [17]. The prospects of measuring this channel at the High-Luminosity LHC (HL-LHC) have also been studied. The ATLAS experiment estimates \(\sim 13{\%}\) precision on the signal strength with 3 \(\hbox {ab}^{-1}\) data assuming the phase-II detector upgrade [18], based on generator-level samples for the main signal and background processes. The CMS experiment projects a precision of \(\sim 10{\%}\), using a full simulation of an upgraded tracker [19]. These numbers apply for the signal strength. On the other hand at the ILC, the measurement of \(\sigma \times {\mathrm {BF}}\) (cross-section times branching fraction) can be turned into a measurement on the BF itself thanks to the total cross-section \(\sigma \) being accessible via the so-called recoil technique in a highly model-independent way [20, 21]. Moreover, by combining other measurements at the ILC, absolute couplings of the Higgs boson can be extracted, e.g. based on SM Effective Field Theory [14].

Fig. 1
figure 1

The Higgs production cross-section as a function of \(\sqrt{s}\) with the beam polarisation configuration of \({\mathcal {P}} (e^{-} , e^{+}) = (-\,80\% , 30\%)\). Taken from Ref. [25]. For the opposite poslarisation, \({\mathcal {P}} (e^{+} , e^{-}) = (-\,80\% , 30\%)\), the cross-section for Zh production is lower by about \(40\%\), while WW-fusion is reduced by a factor of about 16

In this study, the precision expected for the measurement of the branching fraction \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\) at the ILC has been estimated based on a full detector simulation of the International Large Detector (ILD) concept [12, 22] and taking into account all relevant physics and machine-related process. The Higgs production cross-section as a function of \(\sqrt{s}\) at the ILC is shown in Fig. 1, together with corresponding Feynman diagrams. The standard running scenario for the ILC has been assumed, which would accumulate 2 \(\hbox {ab}^{-1}\) at \(\sqrt{s} =250\) GeV and 4 \(\hbox {ab}^{-1}\) at \(\sqrt{s} =500\) GeV with beam polarisation sharing as described in Refs. [23, 24]. Eight different configurations, referred to as analysis channels in the following, have been considered: the two production processes \(e^{+}e^{-} \rightarrow q\overline{q}H \rightarrow q\overline{q}\mu ^{+} \mu ^{-}\) and \(e^{+} e^{-} \rightarrow \nu \overline{\nu }H \rightarrow \nu \overline{\nu } \mu ^{+} \mu ^{-}\), with two beam polarisation configurations, and two \(\sqrt{s}\) cases. The case of the electron-positron polarisation combination \({\mathcal {P}}(e^{-},e^{+}) = (-\,80{\%}, +\,30{\%})\) will be referred to as the left-handed case (denoted by L) and \({\mathcal {P}}(e^{-},e^{+}) = (+\,80{\%}, -\,30{\%})\) as the right-handed case (denoted by R). The cross-section of the \(q\overline{q}H\) process changes by about \(40{\%}\) between the two beam polarisation configurations, while the \(\nu \overline{\nu } H\) process is more significantly affected due to the WW-fusion contribution to the \(\nu \overline{\nu }\) final state. At \(\sqrt{s} =250\) GeV, the \(e^{+} e^{-} \rightarrow ZH\) process is the dominant production process. Thus, the \(ZH \rightarrow q\overline{q} \mu ^{+} \mu ^{-}\) channel is the most important signal process at this energy due to the large branching fraction of \(Z \rightarrow q\overline{q}\). Since WW-fusion is the major production process at \(\sqrt{s} =500\) GeV, \(\nu \overline{\nu } H \rightarrow \nu \overline{\nu } \mu ^{+} \mu ^{-}\) with left-handed polarisation (including both the WW-fusion as well as ZH with \(Z\rightarrow \nu \bar{\nu }\)) is the most relevant channel at this energy. As the cross sections of the ZH and WW-fusion processes will be known to very high precision from other ILC measurements, the separation of the two production modes is not targeted in this paper and it is assumed that the uncertainties on these cross sections will have only a negligible impact on the extraction of the branching fraction. The expected numbers of \(H \rightarrow \mu ^{+} \mu ^{-}\) signal events for each channel are summarised in Table 1, together with the integrated luminosities based on the assumed running scenario. The fourth column introduces the abbreviations to specify the combination of \(\sqrt{s}\), beam polarisation, and signal production process used throughout this paper. The processes \(e^{+} e^{-} \rightarrow \ell ^{+} \ell ^{-} H \rightarrow \ell ^{+} \ell ^{-} \mu ^{+} \mu ^{-}\), where \(\ell \) denotes a lepton (e, \(\mu \), or \(\tau \)), are not considered in this paper.

Table 1 The expected number of signal events \(N_{\mathrm {signal}}\) and abbreviations for each channel, where \(\int L dt\) is the integrated luminosity based on the running scenario [23, 24]

The prospects for measuring the \(H \rightarrow \mu ^{+} \mu ^{-}\) decay at linear \(e^{+} e^{-}\) colliders have been studied previously under various conditions [12, 26,27,28,29,30], but all studies except Ref. [30] have been performed at a centre-of-mass energy of 1 TeV or higher. The studies in Refs. [28, 30] are based on a mass of the Higgs boson of 120 GeV. In Ref. [30] for example, the signal significance is estimated to be \(1.1 \sigma \), which corresponds to a precision on \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\) of 91%, based on 250 \(\hbox {fb}^{-1}\) of data at \(\sqrt{s} = 250\) GeV with beam polarisation of \({\mathcal {P}} = (-\,80\% , +\,30\%)\) from the analysis of the process \(e^{+} e^{-} \rightarrow ZH \rightarrow q\overline{q}\mu ^{+} \mu ^{-}\), assuming a Higgs mass of 120 GeV and the Silicon Detector (SiD) concept [12, 30] for the ILC. Our study comprehensively evaluates the measurement precision of \(H \rightarrow \mu ^{+} \mu ^{-}\) channel assuming the mass of the Higgs boson of 125 GeV and the running scenario of the ILC for \(\sqrt{s} =250\) GeV and 500 GeV.

This paper is structured as follows: in Sect. 2 the ILD concept and the conditions used for producing the Monte-Carlo (MC) data samples are briefly introduced. The details of the analysis at \(\sqrt{s} =250\) GeV and 500 GeV are explained in Sect. 3. The impact of the transverse momentum resolution \(\sigma _{1/P_t}\) of the central ILD tracking system specifically for this analysis is discussed in Sect. 4 before summarising in Sect. 5.

2 The ILD concept and MC samples

ILD [12, 22] is one of the proposed detector concepts for the ILC. It is a multi-purpose detector designed for particle flow analysis based on the reconstruction of hadronic jets. ILD consists of a high precision vertex detector, a time projection chamber, silicon tracking detectors, a highly granular calorimeter system and a forward detector system, all placed inside of a solenoid providing a magnetic field of 3.5 T, surrounded by an iron yoke instrumented for muon detection. The jet energy resolution as a function of \(|\cos \theta _{\mathrm {thrust}}|\) is shown in Fig. 2. Details of the ILD design, as well as about the particle flow concept can be found in Refs. [12, 13, 31].

Fig. 2
figure 2

Taken from Ref. [32]

The jet energy resolution as a function of \(|\cos \theta _{\mathrm {thrust}}|\).

Specifically for this analysis, the key performance aspect of the detector is the transverse momentum resolution \(\sigma _{1/P_t}\), since the invariant mass of the muon pair will be the final observable for distinguishing the signal from the background. The goal of the ILD design for the transverse momentum resolution is \(\sigma _{1/P_t} \sim 2 \times 10^{-5} \ {\mathrm {GeV}} ^{-1}\) at high momenta in the central region of the detector [33]. This level of performance ensures that the model-independent selection of \(e^{+} e^{-} \rightarrow ZH\) events from the recoil against leptonic \(Z \rightarrow \mu ^{+} \mu ^{-}\) decays is dominated by beam energy spread rather than by the detector effects [12]. This goal is compared to the transverse momentum resolution obtained from the ILD full detector simulation in Fig. 3. The impact of transverse momentum resolution will be discussed in Sect. 4. The performance of electromagnetic calorimeter will be important for the recovery of final state radiation photons, which, however, is not yet considered in this study.

Fig. 3
figure 3

Taken from Ref. [12]

The transverse momentum resolution for single muon events as a function of the momentum of particles, for tracks with different polar angles. The points show the resolution as obtained from full simulation of the ILD detector, the lines correspond to the design goal of \(\sigma _{1/P_t} = 2 \times 10^{-5} \text {GeV}^{-1} \oplus 1 \times 10^{-3} / ( P_t \sin \theta )\) for \(\theta = 30^{\circ }\) (green) and \(\theta = 85^{\circ }\) (blue).

The MC samples have been generated in the context of the ILC Technical Design Report [12] with the matrix element generator Whizard [34] (version 1.95). Initial state radiation and the effect of beamstrahlung, as simulated with GuineaPig [35] based on the beam parameters [36], are included in the event generation. Pythia [37] (version 6.422) is used for parton shower development, hadronisation, and to decay short-lived particles, other than leptons. The decays of tau leptons are simulated by Tauola [38,39,40]. The full detector simulation based on Geant4 [41] has been performed in the Mokka framework [42] with the so-called ILD_o1_v05 detector model [12]. The pile-up from \(\gamma \gamma \rightarrow \) low \(P_t\) hadron events has been generated based on the cross-section model described in Ref. [43]. These events have been passed through the same Geant4-based detector simulation and the resulting hits were overlaid to all MC samples before the reconstruction. Events have been reconstructed using PandoraPFA [31] in the Marlin framework [44].

Table 2 List of MC samples used in this analysis

To make the analysis as realistic as possible, all relevant SM processes with up to six fermions in the final state have been included. For the ILC-TDR [12], the SM background samples are grouped with the number of fermions in the final state. For example, the \(e^{+} e^{-} \rightarrow 2f\) process comprises the SM processes with two fermionsFootnote 1 in the final state, i.e., two quarks or two leptons. Table 2 shows the list of MC samples used in this analysis. The total number of simulated events are of the order of \(10^7\) for each centre-of-mass energy. For the production of these MC samples, a significant amount of CPU time was necessary. A new MC production of similar size in the context of the recently completed ILD Interim Design Report [32] required about 320 CPU-years. We did not consider the \(e^{+} e^{-} \rightarrow 6f\), \(e^{\pm }\gamma \rightarrow 5f\), and \(\gamma \gamma \rightarrow 4f\) processes at \(\sqrt{s} =\) 250 GeV because their cross-sections are tiny. The cross-sections of the \(\gamma \gamma \rightarrow 2f\) and \(e^{\pm }\gamma \rightarrow 3f\) processes at \(\sqrt{s} =\) 500 GeV are so large that it was not feasible to produce meaningful statistics in full detector simulation. However, it was checked in full simulation at 250 GeV – and in fast detector simulation at 500 GeV in the context of other analyses – that these processes do not play a role for signatures with high momentum isolated muons, that are rather central.

In the \(q\overline{q}H\) analysis, the process \(e^{+} e^{-} \rightarrow q\overline{q}H\) with \(H \rightarrow \mu ^{+} \mu ^{-}\) is considered as the signal, and all other processes, including other Higgs channels, are considered as background. Similarly, in the \(\nu \overline{\nu } H\) analysis, \(e^{+} e^{-} \rightarrow \nu \overline{\nu } H \rightarrow \nu \overline{\nu } \mu ^{+} \mu ^{-}\) is considered as signal, while all other processes are regarded as background.

3 Analysis

The analysis is structured in the same way for all channels: first, a pair of well-measured, prompt, oppositely charged muons consistent with \(H \rightarrow \mu ^{+} \mu ^{-}\) is selected by a series of sequential cuts described in Sec 3.1. These cuts are “common cuts” for all analysis channels since they only pertain to the properties of the \(H \rightarrow \mu ^{+} \mu ^{-}\) signal-candidates. Then, the rest of the event is subjected to channel-specific event reconstruction and event selection as detailed in Sects. 3.23.4. To perform the final event selection, a multivariate analysis technique is used, as described in Sect. 3.5. Finally, \({\mathrm {BF}} (H \rightarrow \mu ^{+} \mu ^{-})\) is extracted from a template fit to the invariant di-muon mass distributions in each channel. A toy MC technique is applied to estimate the final precision. This technique and its results will be discussed in Sects. 3.6 and 3.7.

Table 3 Requirements for selecting the muon candidates in the IsolatedLeptonTagging algorithm. These variables are defined in the text
Table 4 The “common cuts” for \(H \rightarrow \mu ^{+} \mu ^{-}\) candidates. These variables are defined in the text

3.1 Identification of \(H \rightarrow \mu ^{+} \mu ^{-}\) candidates

First, the IsolatedLeptonTagging algorithm [45] is applied to select the \(H \rightarrow \mu ^{+} \mu ^{-}\) candidates. The criteria required for isolated muon candidates are listed in Table 3. Here, \(E_{\mathrm {CAL}}\) is the total energy deposit in the calorimeter system (apart from the BeamCal), \(p_{\mathrm {track}}\) is the momentum of the track, \(E_{\mathrm {yoke}}\) is the energy deposit in the yoke, \(d_0\) and \(z_0\) are the impact parameters in transverse and longitudinal directions [46], respectively, with their uncertainties \(\sigma (d_0)\) and \(\sigma (z_0)\) as obtained for each individual track fit. A multivariate double-cone method is used to check the isolation of each particle, and a cut on the MVA output is applied. This double-cone method exploits various observables based on the energy deposits located within two cones of different sizes around the muon candidate. In most cases, the default values of the IsolatedLeptonTagging algorithm are used for isolated muon identification. In the case of the nnh250-L/R channels, the signal rate is rather low to start with, while the events hardly contain any other particles than the muons. Therefore, some of the criteria have been relaxed for these channels. The particles passing all the requirements listed in Table 3 are considered as isolated muon candidates. Only events that have exactly one \(\mu ^{+}\) and one \(\mu ^{-}\) are considered for further analysis.

The “common cuts” applied to the \(H \rightarrow \mu ^{+} \mu ^{-}\) candidates are summarised in Table 4. They have been chosen by maximising efficiency times purity. Here, \(\chi ^{2}/{\mathrm {Ndf}}(\mu ^{\pm })\) is the reduced \(\chi ^2\) of the muon track fit, \(\sigma (M_{\mu ^{+} \mu ^{-}})\) is the event-by-event mass uncertainty as obtained from error propagation from the track parameter uncertainties, \(M_{\mu ^{+} \mu ^{-}}\) is the invariant mass of the \(H \rightarrow \mu ^{+} \mu ^{-}\) candidate, and \(\theta _{\mu ^{+} \mu ^{-}}\) is the angle between the \(\mu ^{+}\) and the \(\mu ^{-}\). Cut #1 (\(\chi ^{2} / {\mathrm {Ndf}}\)) serves to select well-measured tracks, followed by two cuts (\(d_0\) and \(z_0\)) which ensure prompt tracks and reject muons likely to originate from \(\tau \) decays. The cut on \(\sigma (M_{\mu ^{+} \mu ^{-}})\), cut #4, rejects events with too imprecise mass measurement, while cut #5 on \(M_{\mu ^{+} \mu ^{-}}\) ensures that the invariant mass of the candidate is well above \(M_Z\), while not removing any di-muons with a mass close to \(M_H\). Figure 4 shows the distribution of \(M_{\mu ^{+} \mu ^{-}}\) before applying cut #5 for the example of the qqh250-L channel. The last cut (#6, \(\cos \theta _{\mu ^{+} \mu ^{-}}\)) requires the di-muons to have a minimum opening angle which is defined by the boost of the produced Higgs boson and thus depends significantly on the centre-of-mass energy.

Fig. 4
figure 4

The distribution of \(M_{\mu ^{+} \mu ^{-}}\) before applying cut #5 (see Table 4) in qqH250-L. The solid blue histogram shows the signal process

Table 5 The channel-specific cuts for the \(q\overline{q}H\) processes. These variables are defined in the text

3.2 ISR identification

Some events include energetic initial state radiation (ISR) photons which, if within the detector acceptance, will affect the further analysis. Therefore a simple ISR identification procedure is applied after the muon identification. First, a candidate photon is selected if its energy \(E_{\mathrm {photon}}\) is greater than 10 GeV. All charged particle energies in a cone with half-opening angle \(\cos \theta _{\mathrm {cone}} = 0.95\) around the photon are summed up. If this energy sum is less than 5% of the photon energy, the photon is regarded as an ISR photon. These ISR photons are not subject to jet reconstruction.

Table 6 The channel-specific cuts for \(\nu \overline{\nu }H\) processes. These variables are defined in the text
Table 7 The number of signal and background events in each channel after the channel-specific cuts, weighted to the beam polarisation and luminosity settings given in Table 1. The definition of “irreducible” is in the text. Numbers in brackets show the signal selection efficiency

3.3 Jet reconstruction

For the \(q\overline{q}H\) channels, a jet clustering algorithm is applied to reconstruct \(Z \rightarrow q\overline{q}\) candidates. The \(H \rightarrow \mu ^{+} \mu ^{-}\) candidates and ISR photons are not included in jet clustering. After the selection of \(H \rightarrow \mu ^{+} \mu ^{-}\) and possible ISR photons, one can expect that the remaining particles consist of \(Z \rightarrow q\overline{q}\) and some contribution from the overlay of \(\gamma \gamma \rightarrow \) low \(P_t\) hadron events. At \(\sqrt{s} = 250\) GeV, only 0.4 \(\gamma \gamma \rightarrow \) low \(P_t\) hadron events are expected per bunch crossing on average [43]. Thus, no dedicated attempt is made to remove the overlay and the Durham clustering algorithm [47] is used to force the remaining particles into two jets. However at \(\sqrt{s} = 500\) GeV, the average number of \(\gamma \gamma \rightarrow \) low \(P_t\) hadron events per bunch crossing increases to 1.7 [43]. To remove this background, an exclusive \(k_T\) clustering algorithm [48, 49] is applied to the remaining particles, requesting four jets with a generalised jet radius of 1.0. The jet radius has been tuned to optimise the reconstruction of the invariant mass spectrum of the \(Z \rightarrow q\overline{q}\) system. The clustering into four jets has been proven to render the overlay-removal step more robust in the presence of hard gluon emission [50]. After this process, the Durham algorithm [47] is used to force the particles contained in the four \(k_T\)-jets into two final jets.

3.4 Preselection

After the general preselection described in Sect. 3.1, channel-specific cuts are applied. Table 5 summarises the cuts applied in the \(q\overline{q}H\) channels. Cut #1 vetoes isolated electrons since no further isolated leptons (electrons and muons) beyond the \(H \rightarrow \mu ^{+} \mu ^{-}\) candidate are expected in the event. For this cut, the IsolatedLeptonTagging algorithm [45] is applied to the remaining particles and it is required that no isolated electrons are found. The cut on the number of charged particles in each jet (#3) is applied to remove reconstructed jets of low charged multiplicity originating mostly from 1-prong hadronic tau decays. The last cut #4 selects events with the invariant mass \(M_{jj}\) of two jets consistent with \(Z \rightarrow q\overline{q}\). Since the di-jet mass resolution is somewhat worse at \(\sqrt{s}=500\) GeV, the allowed mass window is wider than in the \(\sqrt{s}=250\) GeV case.

The channel-specific cuts for the \(\nu \overline{\nu } H\) channel are summarised in Table 6. Apart from the \(H \rightarrow \mu ^{+} \mu ^{-}\) candidate, there should be no visible particles in the event except for some contribution from \(\gamma \gamma \rightarrow \) low \(P_t\) hadron backgrounds which typically have low transverse momentum. Therefore, the number of charged particles in an event, excluding the \(H \rightarrow \mu ^{+} \mu ^{-}\) candidate, which have a transverse momentum larger than 5 GeV, \(N_{P_t}\), has to be zero (cut #1). The four-momenta of all particle flow objects in an event are summed up to the visible four-momentum, of which \(E_{\mathrm {vis}}\) and \(P_t\) are the energy and the transverse momentum, respectively. The visible four-momentum can be subtracted from the nominal four-momentum of the initial state, \((\sqrt{s}, \sqrt{s} \cdot \tan \theta _{\mathrm {cross}} / 2, 0, 0)\) where \(\theta _{\mathrm {cross}} = 14~{\mathrm {mrad}}\) is the crossing angle of beam collision, to obtain the missing four-momentum, and in particular its polar angle, \(\theta _{\mathrm {miss}}\). The cuts #2–#4 (\(E_{\mathrm {vis}}\), missing \(P_t\), and \(\cos \theta _{\mathrm {miss}}\)) use these quantities to select events with neutrinos. The \(E_{\mathrm {vis}}\) requirement thereby depends on the centre-of-mass energy.

Table 7 shows the number of signal and background events in each channel after the preselection. Overall, the signal selection efficiency is \(\sim 85{\%}\) in all channels. The “other Higgs” category includes all events with Higgs bosons with other decay modes than the signal. The category of “irreducible” backgrounds is defined as follows: for the \(q\overline{q}H\) process, \(e^{+} e^{-} \rightarrow 4f \rightarrow qq\mu ^{+} \mu ^{-}\) and \(qq \tau ^{+} \tau ^{-}\) with both tau leptons decaying into \(\mu \) are defined as irreducible. For the \(\nu \overline{\nu } H\) process, \(e^{+} e^{-} \rightarrow 4f \rightarrow \nu \nu \mu ^{+} \mu ^{-}\), \(\nu \nu \mu \tau \) with \(\tau \) decaying into \(\mu \), and \(\nu \nu \tau ^{+} \tau ^{-}\) with both \(\tau \) decaying into \(\mu \) are defined as irreducible. These events have the same or very similar final states to the signal, thus they are difficult to remove. After the preselection, the irreducible category dominates the background in nearly all analysis channels. In case of the qqH500-L/R channels, though, the \(e^{+} e^{-} \rightarrow 6f\) processes (dominated by \(e^{+} e^{-} \rightarrow t\overline{t}\)) remain at the same level as the irreducible background.

3.5 Multivariate analysis

For the further rejection of background, in particular the “irreducible” one, a multivariate analysis is performed based on the gradient boosted decision tree method (BDTG) implemented in the TMVA package in ROOT [51, 52]. Typically, \(\sim 10^{4}\) MC events remain after the preselection for signal and background, each. In all channels, half of the remaining events after the channel-specific preselection are used for training and the other half for testing. The variable \(M_{\mu ^{+} \mu ^{-}}\) is not used in the BDTG since it will be used later in further analysis. The input variables for the BDTG are summarised in Table 8 for all the channels. Here, \(\theta _{jj}\) is the angle between the two jets, \(E_{\mu ^{+} \mu ^{-}}\) is the energy sum of the \(H \rightarrow \mu ^{+} \mu ^{-}\) candidate, \(\theta _{\mu ^{+} (\mu ^{-})}\) is the polar angle of the \(\mu ^{+} (\mu ^{-})\), \(E_{\mathrm {lead}}(E_{\mathrm {sub}})\) is the energy of the higher energy (lower energy) muon of the \(H \rightarrow \mu ^{+} \mu ^{-}\) candidate, \(\theta _{\mathrm {lead}}(\theta _{\mathrm {sub}})\) is the polar angle of the higher energy (lower energy) muon of the \(H \rightarrow \mu ^{+} \mu ^{-}\) candidate, \(M_{\mathrm {recoil}}\) is the recoil mass against the \(H \rightarrow \mu ^{+} \mu ^{-}\) candidate (corrected for reconstructed ISR photons), \(P_{t, \ \mu ^{+} \mu ^{-}}\) is the transverse momentum of the \(H \rightarrow \mu ^{+} \mu ^{-}\) candidate system, and \(\theta _ {\mathrm {thrustaxis}}\) is the polar angle of the thrust axis of the visible part of the event. The variable \(E_{\mu ^{+} \mu ^{-}}\) is used in nnH500-L but not in nnH500-R. For each channel, the minimum number of relevant inputs has been chosen, considering the minimisation of correlations and avoiding overtraining, in particular given the finite amount of available MC events. Figure 5 shows the seven input variables for qqH250-L, while the corresponding distribution of the BDTG score for the signal and background events is displayed in Fig. 6.

Table 8 Input variables to the BDTG for each channel. These variables are defined in the text
Fig. 5
figure 5

Example input variables for the BDTG in qqH250-L. All histograms are normalised to an integral of 1. The signal process is shown in blue, the irreducible backgrounds are contained in the green histograms, while other backgrounds are shown in magenta

For each channel, the final cut value on the BDTG score is chosen such that it optimises the expected precision on \({\mathrm {BF}}(H \rightarrow \mu ^{+}\mu ^{-})\) as described in the following section.

3.6 Extraction of the signal strength

After applying a cut on the BDTG score, the signal strength is extracted from a template fit to the invariant di-muon mass \(M_{\mu ^{+} \mu ^{-}}\) distribution for signal and background with the signal normalisation as a free parameter. The final precision on the branching fraction is estimated via a toy MC technique.

First, the modeling functions for the \(M_{\mu ^{+} \mu ^{-}}\) distributions of signal and background have to be defined. These functions are fitted to the \(M_{\mu ^{+} \mu ^{-}}\) distributions for signal and background as obtained from the full simulation analysis described in the previous subsections.

Due to the excellent mass reconstruction, the whole template fit is restricted to the range \(120\ \mathrm {GeV}< M_{\mu ^{+} \mu ^{-}}\) \(< 130\ \mathrm {GeV}\). For the signal, a linear sum of a Crystal Ball function (CB) [53] and a Gaussian,

$$\begin{aligned} f_S \equiv k \times {\mathrm {CB}} + (1-k) \times {\mathrm {Gaussian}} \quad (\mathrm {with}\ 0< k < 1), \end{aligned}$$
(1)

is used as modeling function \(f_S\). This empirical function models sufficiently well the combined effect of final state radiation photons, which create a tail in the \(M_{\mu ^{+} \mu ^{-}}\) distribution of the signal process, as well as effects of the finite detector resolution. It should be noted that no attempt has been made to recover final state radiation photons because the resolution of the electromagnetic calorimeter is not good enough to improve the mass resolution of events with recovered photons. As we will see in Sect. 4, an excellent invariant mass resolution is a core ingredient to the final performance of the analysis, and thus recovery of final state radiation is not considered worthwhile in this case. For the signal modeling, an unbinned fit is performed to avoid effects of the bin width which, due to finite MC statistics, cannot always be small compared to the width of the mass peak, especially when considering different \(P_t\) resolutions in Sect. 4. The Higgs mass itself is assumed to be known very precisely, to about 14 MeV, from the recoil analysis [20]. Therefore, the mean value of the CB is fixed to the nominal Higgs mass of 125 GeV in this study. Figure 7a illustrates the modeling of the signal \(M_{\mu ^{+} \mu ^{-}}\) distribution using the nnH500-L channel as example. In this example, the parameter k in Eq. (1) is 0.92. The width of the peak at half its maximum height (FWHM) is 0.23 GeV.

The background is modeled by a straight line \(f_B\) in all channels. An example is given in Fig. 7c, again based on the nnH500-L channel.

The fitted \(f_S\) and \(f_B\) are then used as probability density functions for the generation of \(2\times 10^4\) pseudo-data sets via a toy MC technique based on RooFit [52, 54]. In each pseudo-experiment, the number of pseudo-signal(-background) events is drawn from a Poisson distribution with the estimated average number of signal(background) events after all cuts as expectation value. Then, an unbinned fit of the function \(f \equiv Y_Sf_S + Y_Bf_B\) to the sum of pseudo-data is performed, where \(Y_S(Y_B)\) is the yield of signal(background) events. Here, \(Y_B\) is fixed to the expected average number of backgrounds after all cuts, assuming that by the time the ILC has collected its full data set, the SM background at a lepton collider can be predicted much more precisely than the statistical uncertainty for rare signal events. Thus, \(Y_S\) is the only free parameter in the template fit.Footnote 2 Figure 8 shows an example of one pseudo-experiment in the nnH500-L channel. The final \(Y_S\) distribution from \(2\times 10^4\) pseudo-experiments is fitted by a Gaussian to extract its mean and width as shown in Fig. 8b. The expected relative precision on \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\) is calculated as the width of the fitted Gaussian divided by the mean of the fitted Gaussian, which in all channels agrees with the mean number of signal events expected from the full simulation listed in Table 9.

Fig. 6
figure 6

The distribution of BDTG score (qqH250-L). Both histograms are normalised to an integral of 1

Fig. 7
figure 7

\(M_{\mu ^{+} \mu ^{-}}\) distributions for signal and background after all cuts in the nnH500-L channel. The result of the \(f_S\) and \(f_B\) fits, respectively, is shown as red curves

Fig. 8
figure 8

Signal strength extraction in the nnH500-L channel. a Example outcome of one pseudo-experiment, with the pseudo-data shown as black points with error bars, while the solid red curve shows the result of an unbinned fit of \(f \equiv Y_{S}f_{S} + Y_{B}f_{B}\) to the pseudo-data. The dashed red line shows its background component \(Y_{B}f_{B}\). b The distribution of the yield of signal events \(Y_S\) obtained from \(2 \times 10^{4}\) pseudo-experiments, fitted with a Gaussian function. The mean value of the distribution is \(Y_S=30.9\) with a width of 11.4 events, both obtained from the fitted Gaussian

The cut values for the BDTG score have been optimised for each analysis channel by applying the toy MC procedure described above for different values of the cut and selecting the one which gives the best measurement precision. Figures 7 and 8 correspond to the optimal BDTG score cut case in nnH500-L.

3.7 Results and discussion

Table 9 shows the number of signal and background events in each channel after the optimisation of the BDTG score cut. The signal efficiency ranges between \(45\%\) and \(72\%\), with an overall average of \(53\%\). A notable exception is the nnH250-L channel, which gives an optimal result for a very hard cut on the BDTG score and as a result has a rather low efficiency of only \(28\%\), while the background is still higher than in its sister channel nnH250-R. This is an effect of the \(W^{+}W^{-}\) contribution to the irreducible background, which has a much higher cross-section in the left-handed polarisation configuration, while the corresponding increase for the signal is much smaller since it is – at this energy – dominated by ZH production. At 500 GeV, the effect on the background is even more drastic, but since the now WW-fusion dominated signal profits in the same way from the polarisation, there is no need to optimise for an extremely hard cut on the BDTG score. In all cases, the total background count is strongly dominated by the irreducible component. This implies that the misidentification of the final-state particles is not a limiting factor in the analysis. In the future it could be investigated, however, whether the event kinematics could be exploited in a more efficient way to suppress the irreducible component, as will be discussed in more detail below.

Table 9 The number of signal and background events in each channel after all cuts. The optimal cut on the BDTG score is listed in the second column. The numbers in brackets show the signal selection efficiency

The expected precisions on \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\) obtained in the eight channels are summarised in Table 10. With the \(\sqrt{s}=250\) GeV data alone, a precision of \(23\%\) can be reached, dominated by the \(q\bar{q}H\) channels. The \(\sqrt{s}=500\) GeV data alone reach \(24\%\), but now the \(\nu \overline{\nu }H\) channel in the left-handed data set is the most sensitive. A combination of all data sets improves the expected precision to \(17\%\).

These numbers demonstrate a significant improvement with respect to earlier analyses. An extrapolation of the result reported by SiD [30] to a luminosity of 0.9 \(\hbox {ab}^{-1}\), which corresponds to the size of the left-handed data set at \(\sqrt{s} =250\) GeV in the present study, yields a precision on \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\) of \(\sim 48{\%}\). This can be compared to the qqH250-L result of 34% in the present analysis. This difference originates partially from a more sophisticated, multivariate analysis instead of the cut-based selection in the SiD study. But more importantly, for the intermediate momentum range of 40 to 100 GeV which is of relevance here, the ILD momentum resolution is considerably better than that of SiD: in the case of SiD [12], \(\sigma _{1/P_t}\) ranges between \(3 \times 10^{-5}\) \(\hbox {GeV}^{-1}\) for \(P=100\) GeV at \(\theta =90^{\circ }\) to \(2 \times 10^{-4}\) \(\hbox {GeV}^{-1}\) for \(P=40\) GeV at \(\theta =30^{\circ }\), while in the ILD case the relevant numbers range from \(2 \times 10^{-5}\) \(\hbox {GeV}^{-1}\) for \(P=100\) GeV at \(\theta =85^{\circ }\) to \(5 \times 10^{-5}\) \(\hbox {GeV}^{-1}\) for \(P=40\) GeV at \(\theta =30^{\circ }\), as shown in Fig. 3.

Table 10 Expected precisions on \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\) for \(\sqrt{s}=250\) GeV (ILC250), \(\sqrt{s}=500\) GeV (ILC500) and their combination (ILC250 + 500). The luminosities and polarisation sharing correspond to the standard ILC running scenario as detailed in Sect. 1

The combined result of the present study is about \(50\%\) less precise than the most recent projections for the signal strength from the HL-LHC based on the tracker upgrades for ATLAS and CMS introduced in Sect. 1. Taking into account that HL-LHC will provide \(O(10^4)\) \(H \rightarrow \mu ^{+} \mu ^{-}\) events with the full expected integrated luminosity of 3 \(\hbox {ab}^{-1}\), and thus about 100 times more signal events than ILC with 2 \(\hbox {ab}^{-1}\) at \(\sqrt{s} =250\) GeV and 4 \(\hbox {ab}^{-1}\) at \(\sqrt{s} =500\) GeV together, a difference of only \(50\%\) shows the highly efficient use of data possible at an \(e^{+}e^{-}\) collider. In addition, the results of the analysis at ILC1000 presented in Ref. [27] can be extrapolated to the full luminosity of 8 \(\hbox {ab}^{-1}\) expected at ILC1000 [23] to yield a precision of 14% on \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\). Combining this result with our analysis at 250 GeV and 500 GeV yields a precision of 11%. Thus, from a combination of HL-LHC with the full ILC program including 1 TeV a precision of 7% could be expected, without taking into consideration possible improvements of the analysis.

For a better understanding of analysis limitations, one can compare these results with the “theoretical limit” case, i.e. assuming 100% signal selection efficiency and no backgrounds. In this hypothetical case, the precision would reach 10.4% for ILC250, and 7.1% for the full ILC250+500 data set. The results presently achieved in full detector simulation are about a factor of 2.4 more than these theoretical limits for three reasons: the signal efficiency of \(53\%\) on average, the remaining irreducible backgrounds, and the invariant mass resolution for the di-muon system.

  • If only the signal efficiencies as given in Table 9 are considered, the combined precision at ILC250 would be 13.4% and would improve to 9.4% when combined with the 500 GeV data, which is a factor of \(\sim 1.7\) better than the full simulation results. Table 11 shows the detailed cut flow of the single most sensitive channel qqH250-L as an example. At the “common cuts” stage, \(\sim 10\%\) of the signal events are lost. In about \(4\%\) of the events, the muons are not found, and about \(2.5\%\) of events are lost due to the \(d_0\) and invariant mass requirements, each; cf. Table 11. The invariant mass requirement mostly fails due to the presence of FSR, which is not recovered, cf. discussion in Sect. 3.6. During the rest of the preselection, only a few additional percents are lost, while a \(\sim 15\%\) reduction occurs via the BDTG score cut. In total, it seems hard to increase the overall signal efficiency drastically, but some improvement could be achieved by exploiting the variables discussed in the next item.

  • The irreducible background almost entirely consists of processes with the same final state as the signal process: \(e^{+} e^{-} \rightarrow qq\mu ^{+} \mu ^{-}\) for the \(q\overline{q}H\) process and \(e^{+} e^{-} \rightarrow \nu \nu \mu ^{+} \mu ^{-}\) for the \(\nu \overline{\nu } H\) process, originating from ZZ as well as, in the case of \(\nu \nu \mu ^{+} \mu ^{-}\), from \(W^{+}W^{-}\) production and single-\(Z/\gamma \) radiation off a t-channel W. Future upgrades of the analyses could attack these kinds of backgrounds by even better exploitation of all kinematic information, e.g. by testing various intermediate boson hypotheses in a kinematic fit [55] and/or by evaluating the matrix element probabilities for the signal and various background hypotheses on an event-by-event basis [56]. There are also some background events with tau leptons such as \(\nu \nu \tau \mu \) with \(\tau \) decaying to \(\mu \), but this contribution is negligible compared to the ones with exactly the signal final state.

  • Last but not least, the di-muon invariant mass resolution is an important ingredient – as soon as the number of background events at the end of the selection is larger than zero: the sharper the reconstructed Higgs mass peak, the lower the “effective” amount of background under the peak. The invariant mass resolution is dominated by the precision achieved on the transverse momentum of the two muons, while the angular resolutions play a negligible role. Therefore, we will study the impact of the (inverse) transverse momentum resolution \(\sigma _{1/P_t}\) in Sect. 4.

Table 11 Detailed cut flow of the qqH250-L channel. The numbers in brackets show the signal selection efficiency

Finally, it should be noted that in the \(\nu \overline{\nu } H\) process, especially at \(\sqrt{s} =500\) GeV, two signal processes (ZH process with \(Z \rightarrow \nu \overline{\nu }\) and WW-fusion process) are contributing. The relative contributions of these production modes will be fixed to the percent-level or better from other Higgs decay modes like \(H \rightarrow b \overline{b}\) and can be used to convert the cross section times branching fraction measurement into a measurement of \({\mathrm {BF}}(H \rightarrow \mu ^{+}\mu ^{-})\). With the help of the total ZH cross section determined with the recoil method, the absolute \(H \mu \mu \) Yukawa coupling can be extracted. Therefore, the quoted ILC precisions can directly be taken as precision on the branching fraction for \(H \rightarrow \mu ^{+}\mu ^{-}\), or, divided by a factor of two, as precision on the muon Yukawa coupling. This is qualitatively different from the signal strength determinations at the (HL-)LHC.

4 Impact of the transverse momentum resolution

The di-muon mass \(M_{\mu ^{+} \mu ^{-}}\) is the most important observable for this analysis. The uncertainty on \(M_{\mu ^{+} \mu ^{-}}\) is directly related to the precision of the measurement of the muon momentum, and in particular the resolution on its transverse component, \(\sigma _{1/P_t}\), plays a crucial role in this analysis. The transverse momentum resolution of the ILD detector has been shown already in Fig. 3 as a function of the momentum for different polar angles.

Instead of implementing the full p and \(\theta \) dependency of the resolution, a simplified approach of smearing all true muon transverse momenta with the same resolution has been taken here. This is a fully justified approach in case of \(\sqrt{s} =500\) GeV, since the vast majority of muons have high momenta in the asymptotic regime, and, due to the isotropic decays of the Higgs boson, are mostly at large polar angles in the centre of the detector. For the case of \(\sqrt{s} =250\) GeV, the muons have lower momenta between 40 and 100 GeV and the approximation is less precise, but still useful.

The dependence of the result on the asymptotic value of the transverse momentum resolution has been studied by adding a Gaussian-distributed error to the transverse momentum taken from the MC-truth information for all events passing the preselection described in Sect. 3.4. All other quantities in the event are taken from the full simulation as before. Transverse momentum resolutions between \(1 \times 10^{-3}\) to \(1 \times 10^{-6}\) GeV\(^{-1}\) have been considered. The background is kept unchanged from the full simulation study since its invariant mass distribution after the BDTG score cut does not exhibit any sharp peaks, as can be seen, e.g.: in Fig. 7, and thus a change in momentum resolution will not affect the distribution significantly.

Figure 9 shows the obtained precision on \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\) as a function of the transverse momentum resolution \(\sigma _{1/P_t}\) at \(\sqrt{s} = 250\) GeV, together with the theoretical limit as defined in Sect. 3.7 shown by dashed lines. The red line indicates the typical value of transverse momentum resolution at \(\sqrt{s} = 250\) GeV. The “effective” resolution for which the smearing approach gives the same precision on the branching fraction as the full simulation is \(\sim 4 \times 10^{-5}\) GeV\(^{-1}\). This result is consistent with Fig. 3, because at this energy muons typically have momenta in the regime of 40–100 GeV which corresponds to a resolution of around \(\sim 4 \times 10^{-5}\) GeV\(^{-1}\).

The following conclusion can be drawn. First of all, with \(\sigma _{1/P_t} = 2 \times 10^{-4}\) GeV\(^{-1}\) for example, typical for LHC experiments, precision would be 36% instead of 23%, i.e. bigger by a factor of 1.6. Therefore, it is very important for this analysis to reach the ILD goal for the transverse momentum resolution. In the other direction, though technologically not realistic, an improvement of \(\sigma _{1/P_t}\) to a few times \(10^{-6}\) GeV\(^{-1}\) would allow to nearly reach the “zero-background” scenario, in the sense that although the same amount of (irreducible) background events pass the selection, the Higgs signal peak becomes so narrow that the background contribution underneath the peak doesn’t have a significant effect anymore.

Fig. 9
figure 9

Expected precision on \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\) as a function of transverse momentum resolution \(\sigma _{1/P_t}\) (triangles), together with full simulation results discussed in Sect. 3.7 (red line) and the theoretical limits defined in Sect. 3.7 (dashed lines) for 2 ab\(^{-1}\) collected at \(\sqrt{s} = 250\) GeV. The red line indicates the typical transverse momentum resolution range at \(\sqrt{s} = 250\) GeV

Figure 10 shows the equivalent result at \(\sqrt{s} =500\) GeV, while the combined result of both centre-of-mass energies is displayed in Fig. 11. Since at \(\sqrt{s} =500\) GeV the momenta of the muons are higher than in the \(\sqrt{s} = 250\) GeV case, the transverse momentum resolution is closer to the asymptotic performance of \(2 \times 10^{-5}\) GeV\(^{-1}\), and thus the “effective” resolution gets closer to the case of \(2 \times 10^{-5}\) GeV\(^{-1}\), Otherwise, the conclusions remain similar to the \(\sqrt{s} = 250\) GeV case, underlining again the importance to achieve the ILD design goal on the transverse momentum resolution.

Fig. 10
figure 10

Expected precision on \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\) as a function of transverse momentum resolution \(\sigma _{1/P_t}\) (triangles), together with full simulation results discussed in Sect. 3.7 (star) and the theoretical limits defined in Sect. 3.7 (dashed lines) for 4 \(\hbox {ab}^{-1}\) collected at \(\sqrt{s} = 500\) GeV

Fig. 11
figure 11

Expected precision on \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\) as a function of transverse momentum resolution \(\sigma _{1/P_t}\) (triangles), together with full simulation results discussed in Sect. 3.7 (star) and the theoretical limits defined in Sect. 3.7 (dashed lines) for the combination of the 2 \(\hbox {ab}^{-1}\) collected at \(\sqrt{s} =250\) GeV and 4 \(\hbox {ab}^{-1}\) collected at \(\sqrt{s} = 500\) GeV data sets

A similar study has also been performed by the Compact LInear Collider (CLIC) [29], based on \(e^{+} e^{-} \rightarrow \nu \overline{\nu } H\) at \(\sqrt{s} = 1.4\) TeV (the \(q\overline{q}h\) contribution is negligible at 1.4 TeV). Figure 11 in Ref. [29] shows a saturation around \(\sigma _{1/P_t} = 1 \times 10^{-5}\) GeV\(^{-1}\), where the precision on \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\) reaches \(\sim 25{\%}\). While the saturation is reached already for worse \(\sigma _{1/P_t}\) resolutions compared to the ILD case, the CLIC study leads to the same conclusion as our analysis, namely that even a large improvement of the muon momentum resolution would result in only a moderate improvement of the statistical uncertainty of the measured product of the Higgs production cross-section and the branching fraction for the \(H \rightarrow \mu ^{+} \mu ^{-}\) decay. On the other hand, not reaching the design goal for the momentum resolution would lead to a significant loss of sensitivity.

5 Summary

In this study, the prospects for measuring the branching fraction of \(H \rightarrow \mu ^{+} \mu ^{-}\) at the ILC have been evaluated based on full simulation of the ILD detector for the \(\sqrt{s} =250\) GeV and 500 GeV data sets as defined by the standard ILC running scenario. Eight channels have been analysed in total, \(\sqrt{s}\) of 250 GeV and 500 GeV, two beam polarisation cases, and the two signal processes \(q\overline{q}H\) and \(\nu \overline{\nu } H\). The combined precision on \({\mathrm {BF}}(H \rightarrow \mu ^{+} \mu ^{-})\) using all results is estimated to be 17%; the 250 GeV data alone yield a precision of 23%. These results are about a factor of 2.4 bigger than the “theoretical” limit of zero background and \(100\%\) efficiency. Compared to most recent HL-LHC prospects based on the full detector upgrades, the precision is only about \(50\%\) larger, despite the fact that 100 times more \(H \rightarrow \mu ^{+} \mu ^{-}\) events are expected to be produced at HL-LHC. In combination with other ILC measurements, the observed signal strength can be translated into a direct measurement of the branching fraction and thus the muon Yukawa coupling. The combination of HL-LHC and ILC full program up to 1 TeV would provide an ultimate precision of \(\sim 7{\%}\) on BF (\(H \rightarrow \mu ^{+} \mu ^{-}\)). In addition to the full simulation analysis, the impact of the transverse momentum resolution was studied. This study shows the importance to achieve the ILD design goal of the transverse momentum resolution, otherwise the precision will be significantly degraded. The first evaluation of the prospects to measure \({\mathrm {BF}}\,(H \rightarrow \mu ^{+} \mu ^{-})\) at the lower-energy stages of the ILC presented in this paper could be improved in future analyses. Interesting points to study comprise the application of beam-spot constraint in the track fit of the two muons, a full treatment of events with significant FSR and the inclusion of the Z boson decays to charged leptons.