1 Introduction

The discovery of a new particle in the search for the Standard Model (SM) [13] Higgs boson [47] at the LHC was reported by the ATLAS [8] and CMS [9] collaborations in July 2012. There is by now clear evidence of this particle in the \(H\rightarrow \gamma \gamma \), \(H \rightarrow ZZ^{(*)} \rightarrow 4 \ell \), \(H \rightarrow WW^{(*)} \rightarrow \ell \nu \ell \nu \) and \(H\rightarrow \tau \tau \) decay channels, at a mass of around 125  GeV , which have strengthened the SM Higgs boson hypothesis [1015] of the observation. To determine all properties of the new boson experimentally, it is important to study it in as many production and decay modes as possible. In particular, its coupling to heavy quarks is a strong focus of current experimental searches. The SM Higgs boson production in association with a top-quark pair (\({t\bar{t}H}\)) [1619] with subsequent Higgs decay into bottom quarks (\({H\rightarrow b\bar{b}}\)) addresses heavy-quark couplings in both production and decay. Due to the large measured mass of the top quark, the Yukawa coupling of the top quark (\(y_t\)) is much stronger than that of other quarks. The observation of the \({t\bar{t}H}\) production mode would allow for a direct measurement of this coupling, to which other Higgs production modes are only sensitive through loop effects. Since \(y_t\) is expected to be close to unity, it is also argued to be the quantity that might give insight into the scale of new physics [20].

The \({H\rightarrow b\bar{b}}\) final state is the dominant decay mode in the SM for a Higgs boson with a mass of 125 GeV. So far, this decay mode has not yet been observed. While a search for this decay via the gluon fusion process is precluded by the overwhelming multijet background, Higgs boson production in association with a vector boson (VH)  [2123] or a top-quark pair (\(t\bar{t}\)) significantly improves the signal-to-background ratio for this decay.

This paper describes a search for the SM Higgs boson in the \({t\bar{t}H}\) production mode and is designed to be primarily sensitive to the \({H\rightarrow b\bar{b}}\) decay, although other Higgs boson decay modes are also treated as signal. Figure 1a, b show two examples of tree-level diagrams for \({t\bar{t}H}\) production with a subsequent \({H\rightarrow b\bar{b}}\) decay. A search for the associated production of the Higgs boson with a top-quark pair using several Higgs decay modes (including \({H\rightarrow b\bar{b}}\)) has recently been published by the CMS Collaboration [24] quoting a ratio of the measured \({t\bar{t}H}\) signal cross section to the SM expectation for a Higgs boson mass of 125.6\({{\,\mathrm GeV\,}}\) of \(\mu = 2.8 \pm 1.0\).

Fig. 1
figure 1

Representative tree-level Feynman diagrams for the production of the Higgs boson in association with a top-quark pair (\({t\bar{t}H}\)) and the subsequent decay of the Higgs to \(b\bar{b}\), (a, b) for the main background \(t\bar{t}{+}b\bar{b}\) (c)

The main source of background to this search comes from top-quark pairs produced in association with additional jets. The dominant source is \({t\bar{t}{+}b\bar{b}}\) production, resulting in the same final-state signature as the signal. An example is shown in Fig. 1c. A second contribution arises from \(t\bar{t}\) production in association with light-quark (u, d, s) or gluon jets, referred to as \(t\bar{t}\)+light background, and from \(t\bar{t}\) production in association with c-quarks, referred to as \(t\bar{t}\)+\(c\bar{c}\). The size of the second contribution depends on the misidentification rate of the algorithm used to identify b-quark jets.

The search presented in this paper uses 20.3 fb\(^{-1}\) of data collected with the ATLAS detector in pp collisions at \(\sqrt{s}=8{{\,\mathrm TeV}}\) during 2012. The analysis focuses on final states containing one or two electrons or muons from the decay of the \(t\bar{t}\) system, referred to as the single-lepton and dilepton channels, respectively. Selected events are classified into exclusive categories, referred to as “regions”, according to the number of reconstructed jets and jets identified as b-quark jets by the b-tagging algorithm (b-tagged jets or b-jets for short). Neural networks (NN) are employed in the regions with a significant expected contribution from the \(t\bar{t}H\) signal to separate it from the background. Simpler kinematic variables are used in regions that are depleted of the \(t\bar{t}H\) signal, and primarily serve to constrain uncertainties on the background prediction. A combined fit to signal-rich and signal-depleted regions is performed to search for the signal while simultaneously obtaining a background prediction.

2 ATLAS detector

The ATLAS detector [25] consists of four main subsystems: an inner tracking system, electromagnetic and hadronic calorimeters, and a muon spectrometer. The inner detector provides tracking information from pixel and silicon microstrip detectors in the pseudorapidityFootnote 1 range \(|\eta |<2.5\) and from a straw-tube transition radiation tracker covering \(|\eta |<2.0\), all immersed in a 2 T magnetic field provided by a superconducting solenoid. The electromagnetic sampling calorimeter uses lead and liquid-argon (LAr) and is divided into barrel (\(|\eta |<1.475\)) and end-cap regions (\(1.375<|\eta |<3.2\)). Hadron calorimetry employs the sampling technique, with either scintillator tiles or liquid argon as active media, and with steel, copper, or tungsten as absorber material. The calorimeters cover \(|\eta |<4.9\). The muon spectrometer measures muon tracks within \(|\eta |<2.7\) using multiple layers of high-precision tracking chambers located in a toroidal field of approximately 0.5 T and 1 T in the central and end-cap regions of ATLAS, respectively. The muon spectrometer is also instrumented with separate trigger chambers covering \(|\eta |<2.4\).

3 Object reconstruction

The main physics objects considered in this search are electrons, muons, jets and b-jets. Whenever possible, the same object reconstruction is used in both the single-lepton and dilepton channels, though some small differences exist and are noted below.

Electron candidates [26] are reconstructed from energy deposits (clusters) in the electromagnetic calorimeter that are matched to a reconstructed track in the inner detector. To reduce the background from non-prompt electrons, i.e. from decays of hadrons (in particular heavy flavour) produced in jets, electron candidates are required to be isolated. In the single-lepton channel, where such background is significant, an \( \eta \)-dependent isolation cut is made, based on the sum of transverse energies of cells around the direction of each candidate, in a cone of size \(\Delta R = \sqrt{(\Delta \phi )^2 + (\Delta \eta )^2} = 0.2\). This energy sum excludes cells associated with the electron and is corrected for leakage from the electron cluster itself. A further isolation cut is made on the scalar sum of the track \({p_{\mathrm {T}}}\) around the electron in a cone of size \(\Delta R = 0.3 \) (referred to as \({p_\mathrm {T}^{\mathrm {cone30}}}\)). The longitudinal impact parameter of the electron track with respect to the selected event primary vertex defined in Sect. 4, \(z_{0}\), is required to be less than 2 mm. To increase efficiency in the dilepton channel, the electron selection is optimised by using an improved electron identification method based on a likelihood variable [27] and the electron isolation. The ratio of \({p_\mathrm {T}^{\mathrm {cone30}}}\) to the \({p_{\mathrm {T}}}\) of the electron is required to be less than 0.12, i.e. \({p_\mathrm {T}^{\mathrm {cone30}} /p_{\mathrm {T}}^e}\) \(<\) 0.12. The optimised selection improves the efficiency by roughly 7 % per electron.

Muon candidates are reconstructed from track segments in the muon spectrometer, and matched with tracks found in the inner detector [28]. The final muon candidates are refitted using the complete track information from both detector systems, and are required to satisfy \(|\eta |<2.5\). Additionally, muons are required to be separated by \(\Delta R > 0.4 \) from any selected jet (see below for details on jet reconstruction and selection). Furthermore, muons must satisfy a \({p_{\mathrm {T}}}\)-dependent track-based isolation requirement that has good performance under conditions with a high number of jets from other pp interactions within the same bunch crossing, known as “pileup”, or in boosted configurations where the muon is close to a jet: the track \({p_{\mathrm {T}}}\) scalar sum in a cone of variable size \(\Delta R < 10{{\,\mathrm GeV\,}}/{p_{\mathrm {T}}}^\mu \) around the muon must be less than 5 % of the muon \({p_{\mathrm {T}}}\). The longitudinal impact parameter of the muon track with respect to the primary vertex, \(z_{0}\), is required to be less than 2 mm.

Jets are reconstructed from calibrated clusters [25, 29] built from energy deposits in the calorimeters, using the anti-\(k_t\) algorithm [3032] with a radius parameter \(R=0.4\). Prior to jet finding, a local cluster calibration scheme [33, 34] is applied to correct the cluster energies for the effects of dead material, non-compensation and out-of-cluster leakage. The jets are calibrated using energy- and \(\eta \)-dependent calibration factors, derived from simulations, to the mean energy of stable particles inside the jets. Additional corrections to account for the difference between simulation and data are applied [35]. After energy calibration, jets are required to have \({p_{\mathrm {T}}}> 25{{\,\mathrm GeV\,}}\) and \(|\eta | < 2.5\). To reduce the contamination from low-\({p_{\mathrm {T}}}\) jets due to pileup, the scalar sum of the \({p_{\mathrm {T}}}\) of tracks matched to the jet and originating from the primary vertex must be at least 50 % of the scalar sum of the \({p_{\mathrm {T}}}\) of all tracks matched to the jet. This is referred to as the jet vertex fraction. This criterion is only applied to jets with \({p_{\mathrm {T}}}< 50{{\,\mathrm GeV\,}}\) and \(|\eta |<2.4\).

During jet reconstruction, no distinction is made between identified electrons and jet candidates. Therefore, if any of the jets lie \(\Delta R <\) 0.2 from a selected electron, the single closest jet is discarded in order to avoid double-counting of electrons as jets. After this, electrons which are \(\Delta R <\) 0.4 from a jet are removed to further suppress background from non-isolated electrons.

Jets are identified as originating from the hadronisation of a b-quark via an algorithm [36] that uses multivariate techniques to combine information from the impact parameters of displaced tracks with topological properties of secondary and tertiary decay vertices reconstructed within the jet. The working point used for this search corresponds to a 70 % efficiency to tag a b-quark jet, with a light-jet mistag rate of 1 %, and a charm-jet mistag rate of 20 %, as determined for b-tagged jets with \({p_{\mathrm {T}}}>20{{\,\mathrm GeV\,}}\) and \(|\eta |<2.5\) in simulated \(t\bar{t}\) events. Tagging efficiencies in simulation are corrected to match the results of the calibrations performed in data [37]. Studies in simulation show that these efficiencies do not depend on the number of jets.

4 Event selection and classification

For this search, only events collected using a single-electron or single-muon trigger under stable beam conditions and for which all detector subsystems were operational are considered. The corresponding integrated luminosity is 20.3 fb\(^{-1}\). Triggers with different \({p_{\mathrm {T}}}\) thresholds are combined in a logical OR in order to maximise the overall efficiency. The \({p_{\mathrm {T}}}\) thresholds are 24 or 60  GeV for electrons and 24 or 36  GeV for muons. The triggers with the lower \({p_{\mathrm {T}}}\) threshold include isolation requirements on the lepton candidate, resulting in inefficiency at high \({p_{\mathrm {T}}}\) that is recovered by the triggers with higher \({p_{\mathrm {T}}}\) threshold. The triggers use selection criteria looser than the final reconstruction requirements.

Events accepted by the trigger are required to have at least one reconstructed vertex with at least five associated tracks, consistent with the beam collision region in the xy plane. If more than one such vertex is found, the vertex candidate with the largest sum of squared transverse momenta of its associated tracks is taken as the hard-scatter primary vertex.

In the single-lepton channel, events are required to have exactly one identified electron or muon with \({p_{\mathrm {T}}}>25\)  GeV and at least four jets, at least two of which are b-tagged. The selected lepton is required to match, with \(\Delta R < 0.15\), the lepton reconstructed by the trigger.

In the dilepton channel, events are required to have exactly two leptons of opposite charge and at least two b-jets. The leading and subleading lepton must have \({p_{\mathrm {T}}}>25\)  GeV and \({p_{\mathrm {T}}}>15\)  GeV, respectively. Events in the single-lepton sample with additional leptons passing this selection are removed from the single-lepton sample to avoid statistical overlap between the channels. In the dilepton channel, events are categorised into ee, \(\mu \mu \) and \(e\mu \) samples. In the \(e\mu \) category, the scalar sum of the transverse energy of leptons and jets, \({{H_{\mathrm {T}}}}\), is required to be above 130 GeV. In the ee and \(\mu \mu \) event categories, the invariant mass of the two leptons, \({m_{\ell \ell }}\), is required to be larger than 15  GeV in events with more than two b-jets, to suppress contributions from the decay of hadronic resonances such as the \(J/\psi \) and \(\Upsilon \) into a same-flavour lepton pair. In events with exactly two b-jets, \({m_{\ell \ell }}\) is required to be larger than 60  GeV due to poor agreement between data and prediction at lower \({m_{\ell \ell }}\). A further cut on \({m_{\ell \ell }}\) is applied in the ee and \(\mu \mu \) categories to reject events close to the Z boson mass: \(|{m_{\ell \ell }}- m_Z| > 8\) GeV.

After all selection requirements, the samples are dominated by \(t\bar{t}\)+jets background. In both channels, selected events are categorised into different regions. In the following, a given region with m jets of which n are b-jets are referred to as “(\(m \mathrm{j}, n \mathrm{b}\))”. The regions with a signal-to-background ratio \(S/B>\) 1 % and \(S/\sqrt{B}>0.3\), where S and B denote the expected signal for a SM Higgs boson with \(m_H=125{{\,\mathrm GeV\,}}\), and background, respectively, are referred to as “signal-rich regions”, as they provide most of the sensitivity to the signal. The remaining regions are referred to as “signal-depleted regions”. They are almost purely background-only regions and are used to constrain systematic uncertainties, thus improving the background prediction in the signal-rich regions. The regions are analysed separately and combined statistically to maximise the overall sensitivity. In the most sensitive regions, \({(\ge \!{6} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) in the single-lepton channel and \({(\ge \!{4} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) in the dilepton channel, \(H \rightarrow b\bar{b}\) decays are expected to constitute about 90 % of the signal contribution as shown in Fig. 20 of Appendix A.

In the single-lepton channel, a total of nine independent regions are considered: six signal-depleted regions \(( 4 {\mathrm{j}},\,2 {\mathrm{b}})\), \(( 4 {\mathrm{j}},\,2 {\mathrm{b}})\), \(( 4 {\mathrm{j}},\,4 {\mathrm{b}})\), \(( 5 {\mathrm{j}},\,2 {\mathrm{b}})\), \(( 5 {\mathrm{j}},\,3 {\mathrm{b}})\), \(({\ge \!{6} {\mathrm{j}},\,2 {\mathrm{b}})}\), and three signal-rich regions, \(( 5 {\mathrm{j}},\ge \!{4} {\mathrm{b}})\), \({(\ge \!{6} {\mathrm{j}},\,3 {\mathrm{b}})}\) and \({(\ge \!{6} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\). In the dilepton channel, a total of six independent regions are considered. The signal-rich regions are \({(\ge \!{4} {\mathrm{j}},\,3 {\mathrm{b}})}\) and \({(\ge \!{4} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\), while the signal-depleted regions are \((2 {\mathrm{j}},\,2 {\mathrm{b}})\), \((3 {\mathrm{j}},\,2 {\mathrm{b}})\), \((3 {\mathrm{j}},\,3 {\mathrm{b}})\) and \({(\ge \!{4} {\mathrm{j}},\,2 {\mathrm{b}})}\). Figure 2a shows the \(S/\sqrt{B}\) and S / B ratios for the different regions under consideration in the single-lepton channel based on the simulations described in Sect. 5. The expected proportions of different backgrounds in each region are shown in Fig. 2b. The same is shown in the dilepton channel in Fig. 3a, b.

Fig. 2
figure 2

Single-lepton channel: a \(S/\sqrt{B}\) ratio for each of the regions assuming SM cross sections and branching fractions, and \({m_H}=125{{\,\mathrm GeV\,}}\). Each row shows the plots for a specific jet multiplicity (4, 5, \(\ge \)6), and the columns show the b-jet multiplicity (2, 3, \(\ge \)4). Signal-rich regions are shaded in dark red, while the rest are shown in light blue. The S / B ratio for each region is also noted. b The fractional contributions of the various backgrounds to the total background prediction in each considered region. The ordering of the rows and columns is the same as in a

Fig. 3
figure 3

Dilepton channel: a The \(S/\sqrt{B}\) ratio for each of the regions assuming SM cross sections and branching fractions and \({m_H}=125{{\,\mathrm GeV\,}}\). Each row shows the plots for a specific jet multiplicity (2, 3, \(\ge \)4), and the columns show the b-jet multiplicity (2, 3, \(\ge \)4). Signal-rich regions are shaded in dark red, while the rest are shown in light blue. The S / B ratio for each region is also noted. b The fractional contributions of the various backgrounds to the total background prediction in each considered region. The ordering of the rows and columns is the same as in a

5 Background and signal modelling

After the event selection described above, the main background in both the single-lepton and dilepton channels is \(t\bar{t}\)+jets production. In the single-lepton channel, additional background contributions come from single top quark production, followed by the production of a W or Z boson in association with jets (W / Z+jets), diboson (WW, WZ, ZZ) production, as well as the associated production of a vector boson and a \(t\bar{t}\) pair, \(t\bar{t}{+}V\) (\(V=W,Z\)). Multijet events also contribute to the selected sample via the misidentification of a jet or a photon as an electron or the presence of a non-prompt electron or muon, referred to as “Lepton misID” background. The corresponding yield is estimated via a data-driven method known as the “matrix method” [38]. In the dilepton channel, backgrounds containing at least two prompt leptons other than \(t\bar{t}\)+jets production arise from Z+jets, diboson, and Wt-channel single top quark production, as well as from the \(t\bar{t}V\) processes. There are also several processes which may contain either non-prompt leptons that pass the lepton isolation requirements or jets misidentified as leptons. These processes include W+jets, \(t\bar{t}\) production with a single prompt lepton in the final state, and single top quark production in t- and s-channels. Their yield is estimated using simulation and cross-checked with a data-driven technique based on the selection of a same-sign lepton pair. In both channels, the contribution of the misidentified lepton background is negligible after requiring two b-tagged jets.

In the following, the simulation of each background and of the signal is described in detail. For all MC samples, the top quark mass is taken to be \({m_{t}}= 172.5\)  GeV and the Higgs boson mass is taken to be \({m_H}= 125\) GeV.

5.1 \(t\bar{t}\)+jets background

The \(t\bar{t}\)+jets sample is generated using the Powheg-Box 2.0 NLO generator [3941] with the CT10 parton distribution function (PDF) set [42]. It is interfaced to Pythia 6.425 [43] with the CTEQ6L1 PDF set [44] and the Perugia2011C [45] underlying-event tune. The sample is normalised to the top++2.0 [46] theoretical calculation performed at next-to-next-to-leading order (NNLO) in QCD that includes resummation of next-to-next-to-leading logarithmic (NNLL) soft gluon terms [4751].

The \(t\bar{t}\)+jets sample is generated inclusively, but events are categorised depending on the flavour of partons that are matched to particle jets that do not originate from the decay of the \(t\bar{t}\) system. The matching procedure is done using the requirement of \(\Delta R < 0.4\). Particle jets are reconstructed by clustering stable particles excluding muons and neutrinos using the anti-\(k_t\) algorithm with a radius parameter \(R=0.4\), and are required to have \({p_{\mathrm {T}}}>15{{\,\mathrm GeV\,}}\) and \(|\eta |<2.5\).

Events where at least one such particle jet is matched to a bottom-flavoured hadron are labelled as \(t\bar{t}{+}b\bar{b}\) events. Similarly, events which are not already categorised as \(t\bar{t}{+}b\bar{b}\), and where at least one particle jet is matched to a charm-flavoured hadron, are labelled as \(t\bar{t}{+}c\bar{c}\) events. Only hadrons not associated with b and c quarks from top quark and W boson decays are considered. Events labelled as either \(t\bar{t}{+}b\bar{b}\) or \(t\bar{t}{+}c\bar{c}\) are generically referred to as \(t\bar{t}\)+HF events (HF for “heavy flavour”). The remaining events are labelled as \(t\bar{t}\)+light-jet events, including those with no additional jets.

Since Powheg+Pythia only models \(t\bar{t}{+}b\bar{b}\) via the parton shower, an alternative \(t\bar{t}\)+jets sample is generated with the Madgraph5 1.5.11 LO generator [52] using the CT10 PDF set and interfaced to Pythia 6.425 for showering and hadronisation. It includes tree-level diagrams with up to three extra partons (including b- and c-quarks) and uses settings similar to those in Ref. [24]. To avoid double-counting of partonic configurations generated by both the matrix element calculation and the parton-shower evolution, a parton–jet matching scheme (“MLM matching”) [53] is employed.

Fully matched NLO predictions with massive b-quarks have become available recently [54] within the Sherpa with OpenLoops framework [55, 56] referred to in the following as SherpaOL. The SherpaOL NLO sample is generated following the four-flavour scheme using the Sherpa 2.0 pre-release and the CT10 PDF set. The renormalisation scale (\(\mu _\mathrm{R}\)) is set to \(\mu _\mathrm{R}=\prod _{i=t,\bar{t},b,\bar{b}}E_{\mathrm {T},i}^{1/4} \), where \(E_{\mathrm {T},i}\) is the transverse energy of parton i, and the factorisation and resummation scales are both set to \((E_{\mathrm {T}, t }+E_{\mathrm {T}, \bar{t} })/2\).

For the purpose of comparisons between \(t\bar{t}\)+jets event generators and the propagation of systematic uncertainties related to the modelling of \(t\bar{t}\)+HF, as described in Sect. 8.3.1, a finer categorisation of different topologies in \(t\bar{t}\)+HF is made. In particular, the following categories are considered: if two particle jets are both matched to an extra b-quark or extra c-quark each, the event is referred to as \({t\bar{t}{+}b\bar{b}}\) or \({t\bar{t}{+}c\bar{c}}\); if a single particle jet is matched to a single b(c)-quark the event is referred to as \(t\bar{t}\)+b (\(t\bar{t}\)+c); if a single particle jet is matched to a \(b\bar{b}\) or a \(c\bar{c}\) pair, the event is referred to as \(t\bar{t}\)+B or \(t\bar{t}\)+C, respectively.

Figure 4 shows the relative contributions of the different \(t\bar{t}{+}b\bar{b}\) event categories to the total \(t\bar{t}{+}b\bar{b}\) cross section at generator level for the Powheg+Pythia, Madgraph+Pythia and SherpaOL samples. It demonstrates that Powheg+Pythia is able to reproduce reasonably well the \(t\bar{t}\)+HF content of the Madgraph \(t\bar{t}\)+jets sample, which includes a LO \({t\bar{t}{+}b\bar{b}}\) matrix element calculation, as well as the NLO SherpaOL prediction.

Fig. 4
figure 4

Relative contributions of different categories of \(t\bar{t}{+}b\bar{b}\) events in Powheg+Pythia, Madgraph+Pythia and SherpaOL samples. Labels “\(t\bar{t}\)+MPI” and “\(t\bar{t}\)+FSR” refer to events where heavy flavour is produced via multiparton interaction (MPI) or final state radiation (FSR), respectively. These contributions are not included in the SherpaOL calculation. An arrow indicates that the point is off-scale. Uncertainties are from the limited MC sample sizes

The relative distribution across categories is such that SherpaOL predicts a higher contribution of the \(t\bar{t}+B\) category, as well as every category where the production of a second \(b\bar{b}\) pair is required. The modelling of the relevant kinematic variables in each category is in reasonable agreement between Powheg+Pythia and SherpaOL. Some differences are observed in the very low regions of the mass and \({p_{\mathrm {T}}}\) of the \(b\bar{b}\) pair, and in the \({p_{\mathrm {T}}}\) of the top quark and \(t\bar{t}\) systems.

The prediction from SherpaOL is expected to model the \({t\bar{t}{+}b\bar{b}}\) contribution more accurately than both Powheg+Pythia and Madgraph+Pythia. Thus, in the analysis \(t\bar{t}{+}b\bar{b}\) events are reweighted from Powheg+ Pythia to reproduce the NLO \(t\bar{t}{+}b\bar{b}\) prediction from SherpaOL for relative contributions of different categories as well as their kinematics. The reweighting is done at generator level using several kinematic variables such as the top quark \({p_{\mathrm {T}}}\), \(t\bar{t}\) system \({p_{\mathrm {T}}}\), \(\Delta R\) and \({p_{\mathrm {T}}}\) of the dijet system not coming from the top quark decay. In the absence of an NLO calculation of \(t\bar{t}{+}c\bar{c}\) production, the Madgraph+Pythia sample is used to evaluate systematic uncertainties on the \(t\bar{t}{+}c\bar{c}\) background.

Since achieving the best possible modelling of the \(t\bar{t}\)+jets background is a key aspect of this analysis, a separate reweighting is applied to \(t\bar{t}\)+light and \(t\bar{t}{+}c\bar{c}\) events in Powheg+Pythia based on the ratio of measured differential cross sections at \(\sqrt{s}=7{{\,\mathrm TeV}}\) in data and simulation as a function of top quark \({p_{\mathrm {T}}}\) and \(t\bar{t}\) system \({p_{\mathrm {T}}}\) [57]. It was verified using the simulation that the ratio derived at \(\sqrt{s}=7{{\,\mathrm TeV}}\) is applicable to \(\sqrt{s}=8{{\,\mathrm TeV}}\) simulation. It is not applied to the \(t\bar{t}{+}b\bar{b}\) component since that component was corrected to match the best available theory calculation. Moreover, the measured differential cross section is not sensitive to this component. The reweighting significantly improves the agreement between simulation and data in the total number of jets (primarily due to the \(t\bar{t}\) system \({p_{\mathrm {T}}}\) reweighting) and jet \({p_{\mathrm {T}}}\) (primarily due to the top quark \({p_{\mathrm {T}}}\) reweighting). This can be seen in Fig. 5, where the number of jets and the scalar sum of the jet \({p_{\mathrm {T}}}\) (\({{H_{\mathrm {T}}}^{\mathrm{had}}}\)) distributions in the exclusive 2-b-tag region are plotted in the single-lepton channel before and after the reweighting is applied.

Fig. 5
figure 5

The exclusive 2-b-tag region of the single-lepton channel before and after the reweighting of the \({p_{\mathrm {T}}}\) of the \(t\bar{t}\) system and the \({p_{\mathrm {T}}}\) of the top quark of the Powheg+Pythia \(t\bar{t}\) sample. The jet multiplicity distribution (a) before and (b) after the reweighting; \({{H_{\mathrm {T}}}^{\mathrm{had}}}\) distribution c before and d after the reweighting

5.2 Other backgrounds

The W / Z+jets background is estimated from simulation reweighted to account for the difference in the W / Z \({p_{\mathrm {T}}}\) spectrum between data and simulation [58]. The heavy-flavour fraction of these simulated backgrounds, i.e. the sum of \(W/Z+b\bar{b}\) and \(W/Z+c\bar{c}\) processes, is adjusted to reproduce the relative rates of Z events with no b-tags and those with one b-tag observed in data. Samples of W / Z+jets events, and diboson production in association with jets, are generated using the Alpgen 2.14 [59] leading-order (LO) generator and the CTEQ6L1 PDF set. Parton showers and fragmentation are modelled with Pythia 6.425 for W / Z+jets production and with Herwig 6.520 [60] for diboson production. The W+jets samples are generated with up to five additional partons, separately for W+light-jets, \(Wb\bar{b}\)+jets, \(Wc\bar{c}\)+jets, and Wc+jets. Similarly, the Z+jets background is generated with up to five additional partons separated in different parton flavours. Both are normalised to the respective inclusive NNLO theoretical cross section [61]. The overlap between \(WQ\bar{Q}\) (\(ZQ\bar{Q}) \)(\(Q=b,c\)) events generated from the matrix element calculation and those from parton-shower evolution in the W+light-jet (Z+light-jet) samples is removed by an algorithm based on the angular separation between the extra heavy quarks: if \(\Delta R(Q,\bar{Q})>0.4\), the matrix element prediction is used, otherwise the parton shower prediction is used.

The diboson+jets samples are generated with up to three additional partons and are normalised to their respecitve NLO theoretical cross sections [62].

Samples of single top quark backgrounds are generated with Powheg-Box 2.0 using the CT10 PDF set. The samples are interfaced to Pythia 6.425 with the CTEQ6L1 set of parton distribution functions and Perugia2011C underlying-event tune. Overlaps between the \(t\bar{t}\) and Wt final states are removed [63]. The single top quark samples are normalised to the approximate NNLO theoretical cross sections [6466] using the MSTW2008 NNLO PDF set [67, 68].

Samples of \(t\bar{t}{+}V\) are generated with Madgraph 5 and the CTEQ6L1 PDF set. Pythia 6.425 with the AUET2B tune [69] is used for showering. The \(t\bar{t}V\) samples are normalised to the NLO cross-section predictions [70, 71].

5.3 Signal model

The \(t\bar{t}H\) signal process is modelled using NLO matrix elements obtained from the HELAC-Oneloop package [72]. Powheg-Box serves as an interface to shower Monte Carlo programs. The samples created using this approach are referred to as PowHel samples [73]. They are inclusive in Higgs boson decays and are produced using the CT10nlo PDF set and factorisation (\(\mu _\mathrm{F}\)) and renormalisation scales set to \(\mu _\mathrm{F} = \mu _\mathrm{R} = {m_{t}}+{m_H}/2\). The PowHel \({t\bar{t}H}\) sample is showered with Pythia 8.1 [74] with the CTEQ6L1 PDF and the AU2 underlying-event tune [75]. The \(t\bar{t}H\) cross section and Higgs boson decay branching fractions are taken from (N)NLO theoretical calculations [19, 7682], collected in Ref. [83]. In Appendix A, the relative contributions of the Higgs boson decay modes are shown for all regions considered in the analysis.

5.4 Common treatment of MC samples

All samples using Herwig are also interfaced to Jimmy 4.31 [84] to simulate the underlying event. All simulated samples utilise Photos 2.15 [85] to simulate photon radiation and Tauola 1.20 [86] to simulate \(\tau \) decays. Events from minimum-bias interactions are simulated with the Pythia 8.1 generator with the MSTW2008 LO PDF set and the AUET2 [87] tune. They are superimposed on the simulated MC events, matching the luminosity profile of the recorded data. The contributions from these pileup interactions are simulated both within the same bunch crossing as the hard-scattering process and in neighbouring bunch crossings.

Finally, all simulated MC samples are processed through a simulation [88] of the detector geometry and response either using Geant4 [89], or through a fast simulation of the calorimeter response [90]. All simulated MC samples are processed through the same reconstruction software as the data. Simulated MC events are corrected so that the object identification efficiencies, energy scales and energy resolutions match those determined from data control samples.

Figure 6a, b show a comparison of predicted yields to data prior to the fit described in Sect. 9 in all analysis regions in the single-lepton and dilepton channel, respectively. The data agree with the SM expectation within the uncertainties of 10–30 %. Detailed tables of the event yields prior to the fit and the corresponding S / B and \(S/\sqrt{B}\) ratios for the single-lepton and dilepton channels can be found in Appendix B.

Fig. 6
figure 6

Comparison of prediction to data in all analysis regions before the fit to data in a the single-lepton channel and b the dilepton channel. The signal, normalised to the SM prediction, is shown both as a filled red area stacked on the backgrounds and separately as a dashed red line. The hashed area corresponds to the total uncertainty on the yields

When requiring high jet and b-tag multiplicity in the analysis, the number of available MC events is significantly reduced, leading to large fluctuations in the resulting distributions for certain samples. This can negatively affect the sensitivity of the analysis through the large statistical uncertainties on the templates and unreliable systematic uncertainties due to shape fluctuations. In order to mitigate this problem, instead of tagging the jets by applying the b-tagging algorithm, their probabilities to be b-tagged are parameterised as functions of jet flavour, \({p_{\mathrm {T}}}\), and \(\eta \). This allows all events in the sample before b-tagging is applied to be used in predicting the normalisation and shape after b-tagging [91]. The tagging probabilities are derived using an inclusive \(t\bar{t}\)+jets simulated sample. Since the b-tagging probability for a b-jet coming from top quark decay is slightly higher than that of a b-jet with the same \({p_{\mathrm {T}}}\) and \(\eta \) but arising from other sources, they are derived separately. The predictions agree well with the normalisation and shape obtained by applying the b-tagging algorithm directly. The method is applied to all signal and background samples.

6 Analysis method

In both the single-lepton and dilepton channels, the analysis uses a neural network (NN) to discriminate signal from background in each of the regions with significant expected \({t\bar{t}H}\) signal contribution since the \(S/\sqrt{B}\) is very small and the uncertainty on the background is larger than the signal. Those include \(( 5 {\mathrm{j}},\ge \!{4} {\mathrm{b}})\), \({(\ge \!{6} {\mathrm{j}},\,3 {\mathrm{b}})}\) and \({(\ge \!{6} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) in the case of the single-lepton channel, and \({(\ge \!{4} {\mathrm{j}},\,3 {\mathrm{b}})}\) and \({(\ge \!{4} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) in the case of the dilepton channel. In the dilepton channel, an additional NN is used to separate signal from background in the \((3 {\mathrm{j}},\,3 {\mathrm{b}})\) channel. Despite a small expected \(S/\sqrt{B}\), it nevertheless adds sensitivity to the signal due to a relatively high expected S / B. In the single-lepton channel, a dedicated NN is used in the \(( 5 {\mathrm{j}},\,3 {\mathrm{b}})\) region to separate \(t\bar{t}\)+light from \(t\bar{t}\)+HF backgrounds. The other regions considered in the analysis have lower sensitivity, and use \({{H_{\mathrm {T}}}^{\mathrm{had}}}\) in the single-lepton channel, and the scalar sum of the jet and lepton \({p_{\mathrm {T}}}\) (\({{H_{\mathrm {T}}}}\)) in the dilepton channel as a discriminant.

The NNs used in the analysis are built using the NeuroBayes [92] package. The choice of the variables that enter the NN discriminant is made through the ranking procedure implemented in this package based on the statistical separation power and the correlation of variables. Several classes of variables were considered: object kinematics, global event variables, event shape variables and object pair properties. In the regions with \(\ge \)6 (\(\ge \)4) jets, a maximum of seven (five) jets are considered to construct the kinematic variables in the single-lepton (dilepton) channel, first using all the b-jets, and then incorporating the untagged jets with the highest \({p_{\mathrm {T}}}\). All variables used for the NN training and their pairwise correlations are required to be described well in simulation in multiple control regions.

In the \(( 5 {\mathrm{j}},\,3 {\mathrm{b}})\) region in the single-lepton channel, the separation between the \(t\bar{t}\)+light and \(t\bar{t}\)+HF events is achieved by exploiting the different origin of the third b-jet in the case of \(t\bar{t}\)+light compared to \(t\bar{t}\)+HF events. In both cases, two of the b-jets originate from the \(t\bar{t}\) decay. However, in the case of \(t\bar{t}\)+HF events, the third b-jet is likely to originate from one of the additional heavy-flavour quarks, whereas in the case of \(t\bar{t}\)+light events, the third b-jet is often matched to a c-quark from the hadronically decaying W boson. Thus, kinematic variables, such as the invariant mass of the two untagged jets with minimum \(\Delta R\), provide discrimination between \(t\bar{t}\)+light and \(t\bar{t}\)+HF events, since the latter presents a distinct peak at the W boson mass which is not present in the former. This and other kinematic variables are used in the dedicated NN used in this region.

In addition to the kinematic variables, two variables calculated using the matrix element method (MEM), detailed in Sect. 7, are included in the NN training in \({(\ge \!{6} {\mathrm{j}},\,3 {\mathrm{b}})}\) and \({(\ge \!{6} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) regions of the single-lepton channel. These two variables are the Neyman–Pearson likelihood ratio (D1) (Eq. (4)) and the logarithm of the summed signal likelihoods (SSLL) (Eq. (2)). The D1 variable provides the best separation between \({t\bar{t}H}\) signal and the dominant \(t\bar{t}{+}b\bar{b}\) background in the \({(\ge \!{6} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) region. The SSLL variable further improves the NN performance.

The variables used in the single-lepton and dilepton channels, as well as their ranking in each analysis region, are listed in Tables 1 and 2, respectively. For the construction of variables in the \({(\ge \!{4} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) region of the dilepton channel, the two b-jets that are closest in \(\Delta R\) to the leptons are considered to originate from the top quarks, and the other two b-jets are assigned to the Higgs candidate.

Table 1 Single-lepton channel: the definitions and rankings of the variables considered in each of the regions where an NN is used
Table 2 Dilepton channel: the definitions and rankings of the variables considered in each of the regions where an NN is used

Figures 7 and 8 show the distribution of the NN discriminant for the \({t\bar{t}H}\) signal and background in the single-lepton and dilepton channels, respectively, in the signal-rich regions. In particular, Fig. 7a shows the separation between the \(t\bar{t}\)+HF and \(t\bar{t}\)+light-jet production achieved by a dedicated NN in the \(( 5 {\mathrm{j}},\,3 {\mathrm{b}})\) region in the single-lepton channel. The distributions in the highest-ranked input variables from each of the NN regions are shown in Appendix C.

Fig. 7
figure 7

Single-lepton channel: NN output for the different regions. In the \(( 5 {\mathrm{j}},\,3 {\mathrm{b}})\) region (a), the \(t\bar{t}\)+HF production is considered as signal and \(t\bar{t}\)+light as background whereas in the \(( 5 {\mathrm{j}},\ge \!{4} {\mathrm{b}})\) (b), \({(\ge \!{6} {\mathrm{j}},\,3 {\mathrm{b}})}\) (c), and \({(\ge \!{6} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) (d) regions the NN output is for the \({t\bar{t}H}\) signal and total background. The distributions are normalised to unit area

Fig. 8
figure 8

Dilepton channel: NN output for the \({t\bar{t}H}\) signal and total background in the a \((3 {\mathrm{j}},\,3 {\mathrm{b}})\), b \({(\ge \!{4} {\mathrm{j}},\,3 {\mathrm{b}})}\), and c \({(\ge \!{4} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) regions. The distributions are normalised to unit area

For all analysis regions considered in the fit, the \({t\bar{t}H}\) signal includes all Higgs decay modes. They are also included in the NN training.

The analysis regions have different contributions from various systematic uncertainties, allowing the combined fit to constrain them. The highly populated \(( 4 {\mathrm{j}},\,2 {\mathrm{b}})\) and \((2 {\mathrm{j}},\,2 {\mathrm{b}})\) regions in the single-lepton and dilepton channels, respectively, provide a powerful constraint on the overall normalisation of the \(t\bar{t}\) background. The \(( 4 {\mathrm{j}},\,2 {\mathrm{b}})\), \(( 5 {\mathrm{j}},\,2 {\mathrm{b}})\) and \(({\ge \!{6} {\mathrm{j}},\,2 {\mathrm{b}})}\) regions in the single-lepton channel and the \((2 {\mathrm{j}},\,2 {\mathrm{b}})\), \((3 {\mathrm{j}},\,2 {\mathrm{b}})\) and \({(\ge \!{4} {\mathrm{j}},\,2 {\mathrm{b}})}\) regions in the dilepton channel are almost pure in \(t\bar{t}\)+light-jets background and provide an important constraint on \(t\bar{t}\) modelling uncertainties both in terms of normalisation and shape. Uncertainties on c-tagging are reduced by exploiting the large contribution of \(W\rightarrow c s\) decays in the \(t\bar{t}\)+light-jets background populating the \(( 4 {\mathrm{j}},\,3 {\mathrm{b}})\) region in the single-lepton channel. Finally, the consideration of regions with exactly 3 and \(\ge \) 4 b-jets in both channels, having different fractions of \(t\bar{t}{+}b\bar{b}\) and \(t\bar{t}{+}c\bar{c}\) backgrounds, provides the ability to constrain uncertainties on the \(t\bar{t}{+}b\bar{b}\) and \(t\bar{t}{+}c\bar{c}\) normalisations.

7 The matrix element method

The matrix element method [94] has been used by the D0 and CDF collaborations for precision measurements of the top quark mass [95, 96] and for the observations of single top quark production [97, 98]. Recently this technique has been used for the \({t\bar{t}H}\) search by the CMS experiment [99]. By directly linking theoretical calculations and observed quantities, it makes the most complete use of the kinematic information of a given event.

The method calculates the probability density function of an observed event to be consistent with physics process i described by a set of parameters \({\varvec{\alpha }}\). This probability density function \(P_i \left( {\varvec{x}} | {\varvec{\alpha }} \right) \) is defined as

$$\begin{aligned}&P_i\left( {\varvec{x}}|{\varvec{\alpha }}\right) = \frac{(2\pi )^4}{\sigma _i^{\mathrm {exp}}\left( {\varvec{\alpha }}\right) } \int \mathrm {d}p_\mathrm {A} \mathrm {d}p_\mathrm {B} \; {\varvec{f}}\left( p_\mathrm {A}\right) {\varvec{f}}\left( p_\mathrm {B}\right) \nonumber \\&\frac{\left| \mathcal {M}_{i}\left( {\varvec{y}}|{\varvec{\alpha }}\right) \right| ^2}{\mathcal {F}} \; W\left( {\varvec{y}}|{\varvec{x}}\right) \; \mathrm {d}\Phi _N\left( {\varvec{y}}\right) \end{aligned}$$
(1)

and is obtained by numerical integration over the entire phase space of the initial- and final-state particles. In this equation, \({\varvec{x}}\) and \({\varvec{y}}\) represent the four-momentum vectors of all final-state particles at reconstruction and parton level, respectively. The flux factor \(\mathcal {F}\) and the Lorentz-invariant phase space element \(\mathrm {d}\Phi _N\) describe the kinematics of the process. The transition matrix element \(\mathcal {M}_{i}\) is defined by the Feynman diagrams of the hard process. The transfer functions \(W\left( {\varvec{y}}|{\varvec{x}}\right) \) map the detector quantities \({\varvec{x}}\) to the parton level quantities \({\varvec{y}}\). Finally, the cross section \(\sigma _i^{\mathrm {exp}}\) normalises \(P_i\) to unity taking acceptance and efficiency into account.

The assignment of reconstructed objects to final-state partons in the hard process contains multiple ambiguities. The process probability density is calculated for each allowed assignment permutation of the jets to the final-state quarks of the hard process. A process likelihood function can then be built by summing the process probabilities for the \(N_\mathrm{p}\) allowed assignment permutation,

$$\begin{aligned} \mathcal {L}_{i} \left( {\varvec{x}}|{\varvec{\alpha }}\right) = \sum _{p=1}^{N_\mathrm{p}} P_{i}^\mathrm{p} \left( {\varvec{x}}|{\varvec{\alpha }} \right) . \end{aligned}$$
(2)

The process probability densities are used to distinguish signal from background events by calculating the likelihood ratio of the signal and background processes contributing with fractions \(f_{\text {bkg}}\),

$$\begin{aligned} r_{\text {sig}} \left( {\varvec{x}} | {\varvec{\alpha }} \right) = \frac{\mathcal {L}_{\text {sig}} \left( {\varvec{x}}|{\varvec{\alpha }} \right) }{\sum \limits _{\text {bkg}} f_{\text {bkg}} \mathcal {L}_{\text {bkg}} \left( {\varvec{x}}|{\varvec{\alpha }} \right) } . \end{aligned}$$
(3)

This ratio, according to the Neyman–Pearson lemma [100], is the most powerful discriminant between signal and background processes. In the analysis, this variable is used as input to the NN along with other kinematic variables.

Matrix element calculation methods are generated with Madgraph 5 in LO. The transfer functions are obtained from simulation following a similar procedure as described in Ref. [101]. For the modelling of the parton distribution functions the CTEQ6L1 set from the LHAPDF package [102] is used.

The integration is performed using VEGAS [103]. Due to the complexity and high dimensionality, adaptive MC techniques [104], simplifications and approximations are needed to obtain results within a reasonable computing time. In particular, only the numerically most significant contributing helicity states of a process hypothesis for a given event, identified at the start of each integration, are evaluated. This does not perceptibly decrease the separation power but reduces the calculation time by more than an order of magnitude. Furthermore, several approximations are made to improve the VEGAS convergence rate. Firstly, the dimensionality of integration is reduced by assuming that the final-state object directions in \({\eta }\) and \({\phi }\) as well as charged lepton momenta are well measured, and therefore the corresponding transfer functions are represented by \(\delta \) functions. The total momentum conservation and a negligible transverse momentum of the initial-state partons allow for further reduction. Secondly, kinematic transformations are utilised to optimise the integration over the remaining phase space by aligning the peaks of the integrand with the integration dimensions. The narrow-width approximation is applied to the leptonically decaying W boson. This leaves three b-quark energies, one light-quark energy, the hadronically decaying W boson mass and the invariant mass of the two b-quarks originating from either the Higgs boson for the signal or a gluon for the background as the remaining parameters which define the integration phase space. The total integration volume is restricted based upon the observed values and the width of the transfer functions and of the propagator peaks in the matrix elements. Finally, the likelihood contributions of all allowed assignment permutations are coarsely integrated, and only for the leading twelve assignment permutations is the full integration performed, with a required precision decreasing according to their relative contributions.

The signal hypothesis is defined as a SM Higgs boson produced in association with a top-quark pair as shown in Fig. 1a, b. Hence no coupling of the Higgs boson to the W boson is accounted for in \(| \mathcal {M}_{i}|^{2}\) to allow for a consistent treatment when performing the kinematic transformation. The Higgs boson is required to decay into a pair of b-quarks, while the top-quark pair decays into the single-lepton channel. For the background hypothesis, only the diagrams of the irreducible \({t\bar{t}{+}b\bar{b}}\) background are considered. Since it dominates the most signal-rich analysis regions, inclusion of other processes does not improve the separation between signal and background. No gluon radiation from the final-state quarks is allowed, since these are kinematically suppressed and difficult to treat in any kinematic transformation aiming for phase-space alignment during the integration process. In the definition of the signal and background hypothesis the LO diagrams are required to have a top-quark pair as an intermediate state resulting in exactly four b-quarks, two light quarks, one charged lepton (electron or muon) and one neutrino in the final state. Assuming lepton universality and invariance under charge conjugation, diagrams of only one lepton flavour and of only negative charge (electron) are considered. The probability density function calculation of the signal and background is only performed in the \({(\ge \!{6} {\mathrm{j}},\,3 {\mathrm{b}})}\) and \({(\ge \!{6} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) regions of the single-lepton channel. Only six reconstructed jets are considered in the calculation: the four jets with the highest value of the probability to be a b-jet returned by the b-tagging algorithm (i.e. the highest b-tagging weight) and two of the remaining jets with an invariant mass closest to the W boson mass of 80.4 GeV. If a jet is b-tagged it cannot be assigned to a light quark in the matrix element description. In the case of more than four b-tagged jets, only the four with the highest b-tagging weight are treated as b-tagged. Assignment permutations between the two light quarks of the hadronically decaying W boson and between the two b-quarks originating from the Higgs boson or gluon result in the same likelihood value and are thus not considered. As a result there are in total 12 and 36 assignment permutations in the \({(\ge \!{6} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) and \({(\ge \!{6} {\mathrm{j}},\,3 {\mathrm{b}})}\) region, respectively, which need to be evaluated in the coarse integration phase.

Using the \({t\bar{t}H}\) process as the signal hypothesis and the \({t\bar{t}{+}b\bar{b}}\) process as the background hypothesis, a slightly modified version of Eq. (3) is used to define the likelihood ratio D1:

$$\begin{aligned} D1=\frac{\mathcal {L}_{t\bar{t}H}}{{\mathcal {L}_{t\bar{t}H}} + \alpha \cdot \mathcal {L}_{t\bar{t}{+}b\bar{b}}} , \end{aligned}$$
(4)

where \(\alpha =0.23\) is a relative normalisation factor chosen to optimise the performance of the discriminant given the finite bin sizes of the D1 distribution. In this definition, signal-like and background-like events have D1 values close to one and zero, respectively. The logarithm of the summed signal likelihoods defined by Eq. (2) and the ratio D1 are included in the NN training in both the \({(\ge \!{6} {\mathrm{j}},\,3 {\mathrm{b}})}\) and \({(\ge \!{6} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) regions.

8 Systematic uncertainties

Several sources of systematic uncertainty are considered that can affect the normalisation of signal and background and/or the shape of their final discriminant distributions. Individual sources of systematic uncertainty are considered uncorrelated. Correlations of a given systematic effect are maintained across processes and channels. Table 3 presents a summary of the sources of systematic uncertainty considered in the analysis, indicating whether they are taken to be normalisation-only, shape-only, or to affect both shape and normalisation. In Appendix D, the normalisation impact of the systematic uncertainties are shown on the \(t\bar{t}\) background as well as on the \({t\bar{t}H}\) signal.

In order to reduce the degradation of the sensitivity of the search due to systematic uncertainties, they are fitted to data in the statistical analysis, exploiting the constraining power from the background-dominated regions described in Sect. 4. Each systematic uncertainty is represented by an independent parameter, referred to as a “nuisance parameter”, and is fitted with a Gaussian prior for the shape differences and a log-normal distribution for the normalisation. They are centred around zero with a width that corresponds to the given uncertainty.

Table 3 List of systematic uncertainties considered. An “N” means that the uncertainty is taken as normalisation-only for all processes and channels affected, whereas an “S” denotes systematic uncertainties that are considered shape-only in all processes and channels. An “SN” means that the uncertainty is taken on both shape and normalisation. Some of the systematic uncertainties are split into several components for a more accurate treatment. This is the number indicated in the column labelled as “Comp.”

8.1 Luminosity

The uncertainty on the integrated luminosity for the data set used in this analysis is 2.8 %. It is derived following the same methodology as that detailed in Ref. [105]. This systematic uncertainty is applied to all contributions determined from the MC simulation.

8.2 Uncertainties on physics objects

8.2.1 Leptons

Uncertainties associated with the lepton selection arise from the trigger, reconstruction, identification, isolation and lepton momentum scale and resolution. In total, uncertainties associated with electrons (muons) include five (six) components.

8.2.2 Jets

Uncertainties associated with the jet selection arise from the jet energy scale (JES), jet vertex fraction requirement, jet energy resolution and jet reconstruction efficiency. Among these, the JES uncertainty has the largest impact on the analysis. The JES and its uncertainty are derived combining information from test-beam data, LHC collision data and simulation [35]. The jet energy scale uncertainty is split into 22 uncorrelated sources which can have different jet \({p_{\mathrm {T}}}\) and \(\eta \) dependencies. In this analysis, the largest jet energy scale uncertainty arises from the \(\eta \) dependence of the JES calibration in the end-cap regions of the calorimeter. It is the second leading uncertainty.

8.2.3 Heavy- and light-flavour tagging

A total of six (four) independent sources of uncertainty affecting the b(c)-tagging efficiency are considered [37]. Each of these uncertainties corresponds to an eigenvector resulting from diagonalising the matrix containing the information about the total uncertainty per jet \({p_{\mathrm {T}}}\) bin and the bin-to-bin correlations. An additional uncertainty is assigned due to the extrapolation of the b-tagging efficiency measurement to the high-\({p_{\mathrm {T}}}\) region. Twelve uncertainties are considered for the light-jet tagging and they depend on jet \({p_{\mathrm {T}}}\) and \(\eta \). These systematic uncertainties are taken as uncorrelated between b-jets, c-jets, and light-flavour jets.

No additional systematic uncertainty is assigned due to the use of parameterisations of the b-tagging probabilities instead of applying the b-tagging algorithm directly since the difference between these two approaches is negligible compared to the other sources.

8.3 Uncertainties on background modelling

8.3.1 \(t\bar{t}{+}jets\) modelling

An uncertainty of +6.5 %/–6 % is assumed for the inclusive \(t\bar{t}\) production cross section. It includes uncertainties from the top quark mass and choices of the PDF and \(\alpha _{\mathrm {S}}\). The PDF and \(\alpha _{\mathrm {S}}\) uncertainties are calculated using the PDF4LHC prescription [106] with the MSTW2008 68 % CL NNLO, CT10 NNLO [107] and NNPDF2.3 5f FFN [108] PDF sets, and are added in quadrature to the scale uncertainty. Other systematic uncertainties affecting the modelling of \(t\bar{t}\)+jets include uncertainties due to the choice of parton shower and hadronisation model, as well as several uncertainties related to the reweighting procedure applied to improve the \({t\bar{t}}\) MC model. Additional uncertainties are assigned to account for limited knowledge of \(t\bar{t}\)+HF jets production. They are described later in this section.

As discussed in Sect. 5, to improve the agreement between data and the \(t\bar{t}\) simulation a reweighting procedure is applied to \(t\bar{t}\) MC events based on the difference in the top quark \({p_{\mathrm {T}}}\) and \(t\bar{t}\) system \({p_{\mathrm {T}}}\) distributions between data and simulation at \(\sqrt{s}=7{{\,\mathrm TeV}}\) [57]. The nine largest uncertainties associated with the experimental measurement of top quark and \(t\bar{t}\) system \({p_{\mathrm {T}}}\), representing approximately 95 % of the total experimental uncertainty on the measurement, are considered as separate uncertainty sources in the reweighting applied to the MC prediction. The largest uncertainties on the measurement of the differential distributions include radiation modelling in \(t\bar{t}\) events, the choice of generator to simulate \(t\bar{t}\) production, uncertainties on the components of jet energy scale and resolution, and flavour tagging.

Because the measurement is performed for the inclusive \(t\bar{t}\) sample and the size of the uncertainties applicable to the \(t\bar{t}{+}c\bar{c}\) component is not known, two additional uncorrelated uncertainties are assigned to \(t\bar{t}{+}c\bar{c}\) events, consisting of the full difference between applying and not applying the reweightings of the \(t\bar{t}\) system \({p_{\mathrm {T}}}\) and top quark \({p_{\mathrm {T}}}\), respectively.

An uncertainty due to the choice of parton shower and hadronisation model is derived by comparing events produced by Powheg interfaced with Pythia or Herwig. Effects on the shapes are compared, symmetrised and applied to the shapes predicted by the default model. Given that the change of the parton shower model leads to two separate effects – a change in the number of jets and a change of the heavy-flavour content – the parton shower uncertainty is represented by three parameters, one acting on the \(t\bar{t}\)+light contribution and two others on the \(t\bar{t}{+}c\bar{c}\) and \(t\bar{t}{+}b\bar{b}\) contributions. These three parameters are treated as uncorrelated in the fit.

Detailed comparisons of \(t\bar{t}{+}b\bar{b}\) production between Powheg+Pythia and an NLO prediction of \(t\bar{t}{+}b\bar{b}\) production based on SherpaOL have shown that the cross sections agree within 50 % of each other. Therefore, a systematic uncertainty of 50 % is applied to the \({t\bar{t}{+}b\bar{b}}\) component of the \(t\bar{t}\)+jets background obtained from the Powheg+Pythia MC simulation. In the absence of an NLO prediction for the \(t\bar{t}{+}c\bar{c}\) background, the same 50 % systematic uncertainty is applied to the \(t\bar{t}{+}c\bar{c}\) component, and the uncertainties on \({t\bar{t}{+}b\bar{b}}\) and \(t\bar{t}{+}c\bar{c}\) are treated as uncorrelated. The large available data sample allows the determination of the \(t\bar{t}{+}b\bar{b}\) and \(t\bar{t}{+}c\bar{c}\) normalisations with much better precision, approximately 15 and 30 %, respectively (see Appendix D). Thus, the final result does not significantly depend on the exact value of the assumed prior uncertainty, as long as it is larger than the precision with which the data can constrain it. However, even after the reduction, the uncertainties on the \(t\bar{t}{+}b\bar{b}\) and the \(t\bar{t}{+}c\bar{c}\) background normalisation are still the leading and the third leading uncertainty in the analysis, respectively.

Four additional systematic uncertainties in the \(t\bar{t}{+}c\bar{c}\) background estimate are derived from the simultaneous variation of factorisation and renormalisation scales, matching threshold and c-quark mass variations in the Madgraph+Pythia \(t\bar{t}\) simulation, and the difference between the \(t\bar{t}{+}c\bar{c}\) simulation in Madgraph+Pythia and Powheg+Pythia since Madgraph+Pythia includes the \(t\bar{t}{+}c\bar{c}\) process in the matrix element calculation while it is absent in Powheg+Pythia.

For the \(t\bar{t}{+}b\bar{b}\) background, three scale uncertainties, including changing the functional form of the renormalisation scale to \(\mu _\mathrm{R} = (m_t m_{b\bar{b}})^{1/2}\), changing the functional form of the factorisation \(\mu _\mathrm{F}\) and resummation \(\mu _Q\) scales to \(\mu _\mathrm{F} =\mu _\mathrm{Q} =\prod _{i=t,\bar{t},b,\bar{b}}E_\mathrm{{T},i}^{1/4}\) and varying the renormalisation scale \(\mu _\mathrm{R}\) by a factor of two up and down are evaluated. Additionally, the shower recoil model uncertainty and two uncertainties due to the PDF choice in the SherpaOL NLO calculation are quoted. The effect of these variations on the contribution of different \(t\bar{t}{+}b\bar{b}\) event categories is shown in Fig. 9. The renormalisation scale choice and the shower recoil scheme have a large effect on the modelling of \({t\bar{t}{+}b\bar{b}}\). They provide large shape variations of the NN discriminants resulting in the fourth and sixth leading uncertainties in this analysis.

Fig. 9
figure 9

Systematic uncertainties on the \(t\bar{t}{+}b\bar{b}\) contribution based on a scale variations and b PDF choice and shower recoil model of the SherpaOL simulation. The effect of a given systematic uncertainty is shown across the different \(t\bar{t}{+}b\bar{b}\) categories. The effect of migration between categories is covered by variations of these systematic uncertainties

Finally, two uncertainties due to \(t\bar{t}{+}b\bar{b}\) production via multiparton interaction and final-state radiation which are not present in the SherpaOL NLO calculation are applied. Overall, the uncertainties on \({t\bar{t}{+}b\bar{b}}\) normalisation and modelling result in about a 55 % total uncertainty on the \({t\bar{t}{+}b\bar{b}}\) background contribution in the most sensitive \({(\ge \!{6} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) and \({(\ge \!{4} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) regions.

8.3.2 The W / Z+jets modelling

As discussed in Sect. 5, the W / Z+jets contributions are obtained from the simulation and normalised to the inclusive theoretical cross sections, and a reweighting is applied to improve the modelling of the W / Z boson \({p_{\mathrm {T}}}\) spectrum. The full difference between applying and not applying the W / Z boson \({p_{\mathrm {T}}}\) reweighting is taken as a systematic uncertainty, which is then assumed to be symmetric with respect to the central value. Additional uncertainties are assigned due to the extrapolation of the W / Z+jets estimate to high jet multiplicity.

8.3.3 Misidentified lepton background modelling

Systematic uncertainties on the misidentified lepton background estimated via the matrix method [38] in the single-lepton channel receive contributions from the limited number of data events, particularly at high jet and b-tag multiplicities, from the subtraction of the prompt-lepton contribution as well as from the uncertainty on the lepton misidentification rates, estimated in different control regions. The statistical uncertainty is uncorrelated among the different jet and b-tag multiplicity bins. An uncertainty of 50 % associated with the lepton misidentification rate measurements is assumed, which is taken as correlated across jet and b-tag multiplicity bins, but uncorrelated between electron and muon channels. Uncertainty on the shape of the misidentified lepton background arises from the prompt-lepton background subtraction and the misidentified lepton rate measurement.

In the dilepton channel, since the misidentified lepton background is estimated using both the simulation and same-sign dilepton events in data, a 50 % normalisation uncertainty is assigned to cover the maximum difference between the two methods. It is taken as correlated among the different jet and b-tag multiplicity bins. An additional uncertainty is applied to cover the difference in shape between the predictions derived from the simulation and from same-sign dilepton events in data.

8.3.4 Electroweak background modelling

Uncertainties of +5 %/–4 % and \(\pm 6.8\) % are used for the theoretical cross sections of single top production in the single-lepton and dilepton channels [64, 65], respectively. The former corresponds to the weighted average of the theoretical uncertainties on s-, t- and Wt-channel production, while the latter corresponds to the theoretical uncertainty on Wt-channel production, the only single top process contributing to the dilepton final state.

The uncertainty on the diboson background rates includes an uncertainty on the inclusive diboson NLO cross section of \(\pm 5\,\%\) [62] and uncertainties to account for the extrapolation to high jet multiplicity.

Finally, an uncertainty of \(\pm 30\,\%\) is assumed for the theoretical cross sections of the \(t\bar{t}{+}V\) [70, 71] background. An additional uncertainty on \(t\bar{t}{+}V\) modelling arises from variations in the amount of initial-state radiation. The \(t\bar{t}+Z\) background with Z boson decaying into a \(b\bar{b}\) pair is an irreducible background to the \({t\bar{t}H}\), \({H\rightarrow b\bar{b}}\) signal, and as such, has kinematics and an NN discriminant shape similar to those of the signal. The uncertainty on the \(t\bar{t}{+}V\) background normalisation is the fifth leading uncertainty in the analysis.

8.4 Uncertainties on signal modelling

Dedicated NLO PowHel samples are used to evaluate the impact of the choice of factorisation and renormalisation scales on the \(t\bar{t}H\) signal kinematics. In these samples the default scale is varied by a factor of two up and down. The effect of the variations on \(t\bar{t}H\) distributions was studied at particle level and the nominal PowHel \(t\bar{t}H\) sample was reweighted to reproduce these variations. In a similar way, the nominal sample is reweighted to reproduce the effect of changing the functional form of the scale. Additional uncertainties on the \({t\bar{t}H}\) signal due to the choice of PDF, parton shower and fragmentation model and NLO generator are also considered. The effect of the PDF uncertainty on the \({t\bar{t}H}\) signal is evaluated following the recommendation of the PDF4LHC. The uncertainty in the parton shower and fragmentation is evaluated by comparing Powhel+Pythia8 and Powhel+Herwig samples, while the uncertainty due to a generator choice is evaluated by comparing Powhel+Pythia8 with Madgraph5_aMC@NLO [109] interfaced with Herwig++ [110, 111].

9 Statistical methods

The distributions of the discriminants from each of the channels and regions considered are combined to test for the presence of a signal, assuming a Higgs boson mass of \({m_H}= 125 {{\,\mathrm GeV\,}}\). The statistical analysis is based on a binned likelihood function \(\mathcal{L}(\mu ,\theta )\) constructed as a product of Poisson probability terms over all bins considered in the analysis. The likelihood function depends on the signal-strength parameter \(\mu \), defined as the ratio of the observed/expected cross section to the SM cross section, and \(\theta \), denoting the set of nuisance parameters that encode the effects of systematic uncertainties on the signal and background expectations. They are implemented in the likelihood function as Gaussian or log-normal priors. Therefore, the total number of expected events in a given bin depends on \(\mu \) and \(\theta \). The nuisance parameters \(\theta \) adjust the expectations for signal and background according to the corresponding systematic uncertainties, and their fitted values correspond to the amount that best fits the data. This procedure allows the impact of systematic uncertainties on the search sensitivity to be reduced by taking advantage of the highly populated background-dominated control regions included in the likelihood fit. It requires a good understanding of the systematic effects affecting the shapes of the discriminant distributions. The test statistic \(q_\mu \) is defined as the profile likelihood ratio: \(q_\mu = -2\ln (\mathcal{L}(\mu ,\hat{\hat{\theta }}_\mu )/\mathcal{L}(\hat{\mu },\hat{\theta }))\), where \(\hat{\mu }\) and \(\hat{\theta }\) are the values of the parameters that maximise the likelihood function (with the constraints \(0\le \hat{\mu } \le \mu \)), and \(\hat{\hat{\theta }}_\mu \) are the values of the nuisance parameters that maximise the likelihood function for a given value of \(\mu \). This test statistic is used to measure the compatibility of the observed data with the background-only hypothesis (i.e. for \(\mu =0\)), and to make statistical inferences about \(\mu \), such as upper limits using the CL\(_\mathrm{{s}}\) method [112114] as implemented in the RooFit package [115, 116].

To obtain the final result, a simultaneous fit to the data is performed on the distributions of the discriminants in 15 regions: nine analysis regions in the single-lepton channel and six regions in the dilepton channel. Fits are performed under the signal-plus-background hypothesis, where the signal-strength parameter \(\mu \) is the parameter of interest in the fit and is allowed to float freely, but is required to be the same in all 15 fit regions. The normalisation of each background is determined from the fit simultaneously with \(\mu \). Contributions from \(t\bar{t}\), W / Z+jets production, single top, diboson and \(t\bar{t}V\) backgrounds are constrained by the uncertainties of the respective theoretical calculations, the uncertainty on the luminosity, and the data themselves. Statistical uncertainties in each bin of the discriminant distributions are taken into account by dedicated parameters in the fit. The performance of the fit is tested using simulated events by injecting \({t\bar{t}H}\) signal with a variable signal strength and comparing it to the fitted value. Good agreement between the injected and measured signal strength is observed.

10 Results

The results of the binned likelihood fit to data described in Sect. 9 are presented in this section. Figure 10 shows the yields after the fit in all analysis regions in the single-lepton and dilepton channels. The post-fit event yields and the corresponding S / B and \(S/\sqrt{B}\) ratios are summarised in Appendix E.

Fig. 10
figure 10

Event yields in all analysis regions in a the single-lepton channel and b the dilepton channel after the combined fit to data under the signal-plus-background hypothesis. The signal, normalised to the fitted \(\mu \), is shown both as a filled area stacked on the other backgrounds and separately as a dashed line. The hashed area represents the total uncertainty on the yields

Figures 11, 12, 13, 14 and 15 show a comparison of data and prediction for the discriminating variables (either \({{H_{\mathrm {T}}}^{\mathrm{had}}}\), \({{H_{\mathrm {T}}}}\), or NN discriminants) for each of the regions considered in the single-lepton and dilepton channels, respectively, both pre- and post-fit to data. The uncertainties decrease significantly in all regions due to constraints provided by data and correlations between different sources of uncertainty introduced by the fit to the data. In Appendix F, the most highly discriminating variables in the NN are shown post-fit compared to data.

Fig. 11
figure 11

Single-lepton channel: comparison between data and prediction for the discriminant variable used in the \(( 4 {\mathrm{j}},\,2 {\mathrm{b}})\) region a before the fit and b after the fit, in the \(( 4 {\mathrm{j}},\,2 {\mathrm{b}})\) region c before the fit and d after the fit, in the \(( 4 {\mathrm{j}},\,4 {\mathrm{b}})\) region e before the fit and f after the fit. The fit is performed on data under the signal-plus-background hypothesis. The last bin in all figures contains the overflow. The bottom panel displays the ratio of data to the total prediction. An arrow indicates that the point is off-scale. The hashed area represents the uncertainty on the background. The \({t\bar{t}H}\) signal yield (solid) is normalised to the SM cross section before the fit and to the fitted \(\mu \) after the fit. In several regions, predominantly the control regions, the \({t\bar{t}H}\) signal yield is not visible on top of the large background

Fig. 12
figure 12

Single-lepton channel: comparison of data and prediction for the discriminant variable used in the \(( 5 {\mathrm{j}},\,2 {\mathrm{b}})\) region a before the fit and b after the fit, in the \(( 5 {\mathrm{j}},\,3 {\mathrm{b}})\) region c before the fit and d after the fit, in the \(( 5 {\mathrm{j}},\ge \!{4} {\mathrm{b}})\) region e before the fit and f after the fit. The fit is peformed on data under the signal-plus-background hypothesis. The last bin in all figures contains the overflow. The bottom panel displays the ratio of data to the total prediction. An arrow indicates that the point is off-scale. The hashed area represents the uncertainty on the background. The dashed line shows \({t\bar{t}H}\) signal distribution normalised to background yield. The \({t\bar{t}H}\) signal yield (solid) is normalised to the SM cross section before the fit and to the fitted \(\mu \) after the fit. In several regions, predominantly the control regions, the \({t\bar{t}H}\) signal yield is not visible on top of the large background

Fig. 13
figure 13

Single-lepton channel: comparison of data and prediction for the discriminant variable used in the \(({\ge \!{6} {\mathrm{j}},\,2 {\mathrm{b}})}\) region a before the fit and b after the fit, in the \({(\ge \!{6} {\mathrm{j}},\,3 {\mathrm{b}})}\) region c before the fit and d after the fit, in the \({(\ge \!{6} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) region e before the fit and f after the fit. The fit is performed on data under the signal-plus-background hypothesis. The last bin in all figures contains the overflow. The bottom panel displays the ratio of data to the total prediction. An arrow indicates that the point is off-scale. The hashed area represents the uncertainty on the background. The dashed line shows \({t\bar{t}H}\) signal distribution normalised to background yield. The \({t\bar{t}H}\) signal yield (solid) is normalised to the SM cross section before the fit and to the fitted \(\mu \) after the fit. In several regions, predominantly the control regions, the \({t\bar{t}H}\) signal yield is not visible on top of the large background

Fig. 14
figure 14

Dilepton channel: comparison of data and prediction for the discriminant variable used in the \((2 {\mathrm{j}},\,2 {\mathrm{b}})\) region a before the fit and b after the fit, in the \((3 {\mathrm{j}},\,2 {\mathrm{b}})\) region c before the fit and d after the fit, in the \((3 {\mathrm{j}},\,3 {\mathrm{b}})\) region e before the fit and f after the fit. The fit is performed on data under the signal-plus-background hypothesis. The last bin in all figures contains the overflow. The bottom panel displays the ratio of data to the total prediction. An arrow indicates that the point is off-scale. The hashed area represents the uncertainty on the background. The dashed line shows \({t\bar{t}H}\) signal distribution normalised to background yield. The \({t\bar{t}H}\) signal yield (solid) is normalised to the SM cross section before the fit and to the fitted \(\mu \) after the fit. In several regions, predominantly the control regions, the \({t\bar{t}H}\) signal yield is not visible on top of the large background

Fig. 15
figure 15

Dilepton channel: comparison of data and prediction for the discriminant variable used in the \({(\ge \!{4} {\mathrm{j}},\,2 {\mathrm{b}})}\) region a before the fit and b after the fit, in the \({(\ge \!{4} {\mathrm{j}},\,3 {\mathrm{b}})}\) region c before the fit and d after the fit, in the \({(\ge \!{4} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) region e before the fit and f after the fit. The fit is performed on data under the signal-plus-background hypothesis. The last bin in all figures contains the overflow. The bottom panel displays the ratio of data to the total prediction. An arrow indicates that the point is off-scale. The hashed area represents the uncertainty on the background. The dashed line shows \({t\bar{t}H}\) signal distribution normalised to background yield. The \({t\bar{t}H}\) signal yield (solid) is normalised to the SM cross section before the fit and to the fitted \(\mu \) after the fit. In several regions, predominantly the control regions, the \({t\bar{t}H}\) signal yield is not visible on top of the large background

Table 4 shows the observed \(\mu \) values obtained from the individual fits in the single-lepton and dilepton channels, and their combination. The signal strength from the combined fit for \(m_H=125{{\,\mathrm GeV\,}}\) is:

$$\begin{aligned} \mu (m_H=125{{\,\mathrm GeV\,}}) = 1.5 \pm 1.1. \end{aligned}$$
(5)

The expected uncertainty for the signal strength (\(\mu ~=~1\)) is \(\pm 1.1\). The observed (expected) significance of the signal is 1.4 (1.1) standard deviations, which corresponds to an observed (expected) p-value of 8 % (15 %). The probability, p, to obtain a result at least as signal-like as observed if no signal is present is calculated using \(q_0 = -2\mathrm{ln} (\mathcal {L}(0,\hat{\hat{\theta _{\mu }}})/\mathcal {L}(\hat{\mu }, \hat{\theta }))\) as a test statistic.

Table 4 The fitted values of signal strength and their uncertainties for the individual channels as well as their combination, assuming \(m_H=125{{\,\mathrm GeV\,}}.\) Total uncertainties are shown

The fitted values of the signal strength and their uncertainties for the individual channels and their combination are shown in Fig. 16.

The observed limits, those expected with and without assuming a SM Higgs boson with \(m_H=125{{\,\mathrm GeV\,}}\), for each channel and their combination are shown in Fig. 17. A signal 3.4 times larger than predicted by the SM is excluded at 95 % CL using the CL\(_\mathrm{{s}}\) method. A signal 2.2 times larger than for the SM Higgs boson is expected to be excluded in the case of no SM Higgs boson, and 3.1 times larger in the case of a SM Higgs boson. This is also summarised in Table 5.

Table 5 Observed and expected (median, for the background-only hypothesis) 95 % CL upper limits on \(\sigma (t\bar{t}H)\) relative to the SM prediction, for the individual channels as well as their combination, assuming \(m_H=125{{\,\mathrm GeV\,}}\). The 68 and 95 % confidence intervals around the expected limits under the background-only hypothesis are also provided, denoted by \(\pm 1 \sigma \) and \(\pm 2 \sigma \), respectively. The expected (median) 95 % CL upper limits assuming the SM prediction for \(\sigma (t\bar{t}H)\) are shown in the last column

Figure 18 summarises post-fit event yields as a function of \(\log _{10}(S/B)\), for all bins of the distributions used in the combined fit of the single-lepton and dilepton channels. The value of \(\log _{10}(S/B)\) is calculated according to the post-fit yields in each bin of the fitted distributions, either \({{H_{\mathrm {T}}}^{\mathrm{had}}}\), \({{H_{\mathrm {T}}}}\), or NN. The total number of background and signal events is displayed in bins of \(\log _{10}(S/B)\). In particular, the last bin of Fig. 18 includes the two last bins from the most signal-rich region of the NN distribution in \({(\ge \!{6} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) and the two last bins from the most signal-rich region of the NN in \({(\ge \!{4} {\mathrm{j}},\ge \!{4} {\mathrm{b}})}\) from the fit. The signal is normalised to the fitted value of the signal strength (\(\mu = 1.5\)) and the background is obtained from the global fit. A signal strength 3.4 times larger than predicted by the SM, which is excluded at 95 % CL by this analysis, is also shown.

Figure 19 demonstrates the effect of various systematic uncertainties on the fitted value of \(\mu \) and the constraints provided by the data. The post-fit effect on \(\mu \) is calculated by fixing the corresponding nuisance parameter at \(\hat{{\theta }} \pm \sigma _{{\theta }}\), where \(\hat{{\theta }}\) is the fitted value of the nuisance parameter and \(\sigma _{{\theta }}\) is its post-fit uncertainty, and performing the fit again. The difference between the default and the modified \(\mu \), \(\Delta \mu \), represents the effect on \(\mu \) of this particular systematic uncertainty. The largest effect arises from the uncertainty in normalisation of the irreducible \({t\bar{t}{+}b\bar{b}}\) background. This uncertainty is reduced by more than one half from the initial 50 %. The \({t\bar{t}{+}b\bar{b}}\) background normalisation is pulled up by about 40 % in the fit, resulting in an increase in the observed \({t\bar{t}{+}b\bar{b}}\) yield with respect to the Powheg+Pythia prediction. Most of the reduction in uncertainty on the \({t\bar{t}{+}b\bar{b}}\) normalisation is the result of the significant number of data events in the signal-rich regions dominated by \({t\bar{t}{+}b\bar{b}}\) background. With no Gaussian prior considered on the \({t\bar{t}{+}b\bar{b}}\) normalisation, as described in Sect. 8, the fit still prefers an increase in the amount of \({t\bar{t}{+}b\bar{b}}\) background by about 40 %.

Fig. 16
figure 16

The fitted values of the signal strength and their uncertainties for the individual channels and their combination. The green line shows the statistical uncertainty on the signal strength

Fig. 17
figure 17

95 % CL upper limits on \(\sigma (t\bar{t}H)\) relative to the SM prediction, \(\sigma /\sigma _{\mathrm {SM}}\), for the individual channels as well as their combination. The observed limits (solid lines) are compared to the expected (median) limits under the background-only hypothesis and under the signal-plus-background hypothesis assuming the SM prediction for \(\sigma (t\bar{t}H)\) and pre-fit prediction for the background. The surrounding shaded bands correspond to the 68 and 95 % confidence intervals around the expected limits under the background-only hypothesis, denoted by \(\pm 1 \sigma \) and \(\pm 2 \sigma \), respectively

Fig. 18
figure 18

Event yields as a function of \(\log _{10}(S/B)\), where S (signal yield) and B (background yield) are taken from the \({{H_{\mathrm {T}}}^{\mathrm{had}}}\), \({{H_{\mathrm {T}}}}\), and NN output bin of each event. Events in all fitted regions are included. The predicted background is obtained from the global signal-plus-background fit. The \({t\bar{t}H}\) signal is shown both for the best fit value (\(\mu = 1.5\)) and for the upper limit at 95 % CL (\(\mu =3.4\))

The \({t\bar{t}{+}b\bar{b}}\) modelling uncertainties affecting the shape of this background also have a significant effect on \(\mu \). These systematic uncertainties affect only the \({t\bar{t}{+}b\bar{b}}\) modelling and are not correlated with the other \(t\bar{t}\)+jets backgrounds. The largest of the uncertainties is given by the renormalisation scale choice. The uncertainty drastically changes the shape of the NN for the \({t\bar{t}{+}b\bar{b}}\) background, making it appear more signal-like.

Fig. 19
figure 19

The fitted values of the nuisance parameters with the largest impact on the measured signal strength. The points, which are drawn conforming to the scale of the bottom axis, show the deviation of each of the fitted nuisance parameters, \(\hat{{\theta }}\), from \(\mathrm {\theta _{0}}\), which is the nominal value of that nuisance parameter, in units of the pre-fit standard deviation \(\Delta \theta \). The error bars show the post-fit uncertainties, \(\sigma _{\theta }\), which are close to 1 if the data do not provide any further constraint on that uncertainty. Conversely, a value of \(\sigma _{\theta }\) much smaller than 1 indicates a significant reduction with respect to the original uncertainty. The nuisance parameters are sorted according to the post-fit effect of each on \(\mu \) (hashed blue area) conforming to the scale of the top axis, with those with the largest impact at the top

The \({t\bar{t}{+}c\bar{c}}\) normalisation uncertainty is ranked third (Fig. 19) and its pull is slightly negative, while the post-fit yields for \({t\bar{t}{+}c\bar{c}}\) increase significantly in the four- and five-jet regions in the single-lepton channel and in the two- and three-jet regions of the dilepton channel (see Tables 10, 11 of Appendix 1). It was verified that this effect is caused by the interplay between the \({t\bar{t}{+}c\bar{c}}\) normalisation uncertainty and several other systematic uncertainties affecting the \({t\bar{t}{+}c\bar{c}}\) background yield.

The noticeable effect of the light-jet tagging (mistag) systematic uncertainty is explained by the relatively large fraction of the \(t\bar{t}\)+light background in the signal region with four b-jets in the single-lepton channel. The \(t\bar{t}\)+light events enter the 4-b-tag region through a mistag as opposed to the 3-b-tag region where tagging a c-jet from a W boson decay is more likely. Since the amount of data in the 4-b-tag regions is not large this uncertainty cannot be constrained significantly.

The \(t\bar{t}+Z\) background with \(Z \rightarrow b\bar{b}\) is an irreducible background to the \({t\bar{t}H}\) signal as it has the same number of b-jets in the final state and similar event kinematics. Its normalisation has a notable effect on \(\mu \) (\(\mathrm{d}\mu /\mathrm{d}\sigma ({t\bar{t}V}) = 0.3\)) and the uncertainty arising from the \(t\bar{t}{+}V\) normalisation cannot be significantly constrained by the fit. Other leading uncertainties include b-tagging and some components of the JES uncertainty.

Uncertainties arising from jet energy resolution, jet vertex fraction, jet reconstruction and JES that affect primarily low \({p_{\mathrm {T}}}\) jets as well as the \(t\bar{t}\)+light-jet background modelling uncertainties are constrained mainly in the signal-depleted regions. These uncertainties do not have a significant effect on the fitted value of \(\mu \).

11 Summary

A search has been performed for the Standard Model Higgs boson produced in association with a top-quark pair (\(t\bar{t}H\)) using 20.3 fb\(^{-1}\) of pp collision data at \(\sqrt{s}=8{{\,\mathrm TeV}}\) collected with the ATLAS detector during the first run of the Large Hadron Collider. The search focuses on \({H\rightarrow b\bar{b}}\) decays, and is performed in events with either one or two charged leptons.

To improve sensitivity, the search employs a likelihood fit to data in several jet and b-tagged jet multiplicity regions. Systematic uncertainties included in the fit are significantly constrained by the data. Discrimination between signal and background is obtained in both final states by employing neural networks in the signal-rich regions. In the single-lepton channel, discriminating variables are calculated using the matrix element technique. They are used in addition to kinematic variables as input to the neural network. No significant excess of events above the background expectation is found for a Standard Model Higgs boson with a mass of 125 GeV. An observed (expected) 95 % confidence-level upper limit of 3.4 (2.2) times the Standard Model cross section is obtained. By performing a fit under the signal-plus-background hypothesis, the ratio of the measured signal strength to the Standard Model expectation is found to be \(\mu = 1.5 \pm 1.1\).