1 Introduction

The top quark may play a special role in the standard model (SM) of particle physics owing to its large mass and its possible importance in electroweak symmetry breaking [1, 2]. Measurements of \({\mathrm{t}}\overline{\mathrm{t}}\) production provide crucial information about the accuracy of the SM near the electroweak scale [3, 4], and in assessing the predictions of quantum chromodynamics (QCD) at large mass scales. In turn, they can be used to determine the fundamental parameters of the theory, such as the strong coupling constant or the top quark mass [5, 6].

Previous differential measurements of the \({\mathrm{t}}\overline{\mathrm{t}}\) production cross section [7,8,9,10,11,12,13,14,15] at the Fermilab Tevatron and CERN LHC show excellent agreement with SM predictions. However, investigations of top quarks with very large transverse momenta \(p_{\mathrm{T}} \) have proven to be difficult, since in this kinematic range the decays of the top quark to fully hadronic final states become highly collimated and merge into single jets. In this highly boosted regime, the \({\mathrm{t}}\overline{\mathrm{t}}\) reconstruction efficiency deteriorates for previous, more-traditional measurements. Special reconstruction techniques based on jet substructure are often used to improve the measurements [16, 17] or to implement searches for new physics [18,19,20,21,22,23,24,25,26,27,28]. A detailed understanding of jet substructure observables, and especially the jet mass \(m_{\text {jet}}\), is crucial for LHC analyses of highly boosted topologies. While measurements of \(m_{\text {jet}}\) corrected to the particle level have been carried out for light-quark and gluon jets [29, 30], the \(m_{\text {jet}}\) distribution for highly boosted top quarks has not yet been measured.

Apart from testing the simulation of \(m_{\text {jet}}\) in fully hadronic top quark decays, the location of the peak of the \(m_{\text {jet}}\) distribution is sensitive to the top quark mass \(m_{{\mathrm{t}}}\)  [31]. This measurement therefore provides an alternative method of determining \(m_{{\mathrm{t}}}\) in the boosted regime, independent of previous mass measurements [32,33,34,35,36,37]. Calculations from first principles have been performed in soft collinear effective theory [38,39,40,41] for the dijet invariant mass distribution from highly boosted top quark production in \(\mathrm {e}^+\mathrm {e}^-\) collisions [42, 43], and work is ongoing to extend this to the LHC environment [44, 45]. Such calculations account for perturbative and nonperturbative effects, and provide particle-level predictions. Once predictions for the LHC become available, the measurement of the \(m_{\text {jet}}\) distribution can lead to an extraction of \(m_{{\mathrm{t}}}\) without the ambiguities that arise from the unknown relation between \(m_{{\mathrm{t}}}\) in a well-defined renormalisation scheme and the top quark mass parameter used in Monte Carlo (MC) simulations [45,46,47,48].

We present the first measurement of the differential \({\mathrm{t}}\overline{\mathrm{t}}\) production cross section as a function of the leading-jet mass, where leading refers to the jet with the highest \(p_{\mathrm{T}} \). The measurement is based on data from \(\mathrm {p}\mathrm {p}\) collisions at \(\sqrt{s} = 8\,\text {TeV} \), recorded by the CMS experiment at the LHC in 2012 and corresponding to an integrated luminosity of 19.7\(\,\text {fb}^{-1}\). It is performed on \({\mathrm{t}}\overline{\mathrm{t}}\) events in which the leading jet includes all \({\mathrm{t}} \rightarrow \mathrm{b} \mathrm {W^+}\rightarrow \mathrm{b} \mathrm{q} \overline{\mathrm{q}} '\) decay products. The other top quark is required to decay through the semileptonic mode \(\overline{{\mathrm{t}}} \rightarrow \overline{{\mathrm{b}}} \mathrm {W^{-}}\rightarrow \overline{{\mathrm{b}}} \ell \overline{\nu } _\ell \), where \(\ell \) can be either an electron or muon. The use of charge-conjugate modes is implied throughout this article. The semileptonic top quark decay serves as a means for selecting \({\mathrm{t}}\overline{\mathrm{t}}\) events without biasing the \(m_{\text {jet}}\) distribution from the fully hadronic top quark decay. The highly boosted top quark jets used in the measurement are defined through the Cambridge–Aachen (CA) jet-clustering algorithm [49, 50] with a distance parameter \(R=1.2\) and \(p_{\mathrm{T}} >400\,\text {GeV} \). The \(m_{\text {jet}}\) distribution is unfolded to the particle level and compared to predictions from MC simulations. The measurement is also normalised to a fiducial-region total cross section defined below, and shows the expected sensitivity to the value of \(m_{{\mathrm{t}}}\). An extraction of the value of \(m_{{\mathrm{t}}}\) is performed to assess the overall sensitivity of the measurement.

2 The CMS detector

The central feature of the CMS detector is a superconducting solenoid of 6\(\text {\,m}\) internal diameter, providing a magnetic field of 3.8 \(\text {\,T}\). A silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections reside within the magnetic volume. In addition to the barrel and endcap detectors, CMS has extensive forward calorimetry. Muons are detected using four layers of gas-ionization detectors embedded in the steel flux-return yoke of the magnet. The inner tracker measures charged particle trajectories within the pseudorapidity range \(|\eta | < 2.5\). A two-stage trigger system [51] is used to select for analysis \(\mathrm {p}\mathrm {p}\) collisions of scientific interest. A more detailed description of the CMS detector, together with a definition of the coordinate system and relevant kinematic variables, can be found in Ref. [52].

3 Event reconstruction

The CMS experiment uses a particle-flow (PF) event reconstruction [53, 54], which aggregates input from all subdetectors. This information includes charged particle tracks from the tracking system and energies deposited in the ECAL and HCAL, taking advantage of the granularity of the subsystems. Particles are classified as electrons, muons, photons, and charged and neutral hadrons. Primary vertices are reconstructed using a deterministic annealing filter algorithm [55]. The vertex with the largest sum in the associated track \(p_{\mathrm{T}} ^2\) values is taken to be the primary event vertex.

Muons are detected and measured in the pseudorapidity range \(|\eta | < 2.1\) using the information collected in the muon and tracking detectors [56]. Tracks from muon candidates must be consistent with a muon originating from the primary event vertex, and satisfy track-fit quality requirements [57].

Electrons are reconstructed in the range \(|\eta | < 2.1\), by combining tracking information with energy deposits in the ECAL [58, 59]. Electron candidates are required to originate from the primary event vertex. Electrons are identified through the information on the energy distribution in their shower, the track quality, the spatial match between the track and electromagnetic cluster, and the fraction of total cluster energy in the HCAL. Electron candidates that are consistent with originating from photon conversions in the detector material are rejected.

Since the top quark decay products can be collimated at high values of top quark \(p_{\mathrm{T}}\), no isolation requirements on the leptons are imposed in either the trigger or in the offline selections (see Sect. 4). The imbalance in event \(\mathbf {p_{\mathrm{T}}}\) is quantified as the missing transverse momentum vector \({\mathbf {p}}_{\mathrm {T}}^{\text {miss}}\), defined as the projection on the plane perpendicular to the beams of the negative vector sum of the momenta of all PF candidates in the event. Its magnitude is referred to as \(p_{\mathrm{T}} ^\text {miss}\).

The PF candidates are clustered into jets by using the FastJet 3.0 software package [60]. Charged hadrons associated with event vertices other than the primary event vertex are removed prior to jet clustering. Isolated leptons (either electron or muon) are not part of the input list for jet finding [53, 54]. Small-radius jets are clustered with the anti-\(k_{\mathrm {T}}\) jet-clustering algorithm [61] with a distance parameter \(R=0.5\) (AK5 jets). These small-radius jets are used at the trigger level, in the first steps of the event selection, and for the identification of jets coming from the hadronisation of \(\mathrm{b} \) quarks. If a nonisolated lepton candidate is found within the angular distance \({\Delta R < 0.5}\) of an AK5 jet, its four-momentum is subtracted from that of the jet to avoid double counting of energy and ensure proper jet energy corrections. The angular distance is given by \(\Delta R = \sqrt{\smash [b]{(\Delta \phi )^2 + (\Delta \eta )^2}}\), where \(\Delta \phi \) and \(\Delta \eta \) are the differences in azimuthal angle (in radians) and pseudorapidity, respectively, between the directions of the lepton and jet. Large-radius jets are obtained by using the CA jet-clustering algorithm [49, 50] with \(R=1.2\) (CA12 jets). When a lepton candidate is found among the PF candidates clustered into a CA12 jet, its four-momentum is subtracted from that of the CA12 jet. In this paper, the unmodified term ”jet” will refer to the broad CA12 jets.

All jets could contain neutral particles from additional \(\mathrm {p}\mathrm {p}\) collisions in the same or nearby beam crossings (pileup). This extra contribution is subtracted based on the average expectation of the pileup in the jet catchment area [62]. This is done by calculating a correction for the average offset energy density in each event as a function of the number of primary vertices [63, 64]. The AK5 jets are identified as originating from the fragmentation of a \(\mathrm{b} \) quark with the combined secondary vertex algorithm (CSV) [65]. A tight operating point is used, which has a misidentification probability of 0.1% for tagging light-parton jets with an average \(p_{\mathrm{T}}\) of about 80\(\,\text {GeV}\), and an efficiency of about 50% for a heavy-flavour jet with \(p_{\mathrm{T}} \) in the range 50–160\(\,\text {GeV}\). Above 160\(\,\text {GeV}\), the efficiency decreases gradually to about 30% for a \(p_{\mathrm{T}} \) value of 400\(\,\text {GeV}\)  [65]. All jets are required to satisfy quality selections to minimize the impact of calorimeter noise and other sources of misidentified jets [66]. Events are also required to satisfy selection criteria to remove events with large values of \(p_{\mathrm{T}} ^\text {miss} \) from calorimeter noise, as described in Ref. [67].

The jet mass \(m_{\text {jet}}\) is calculated from the four-vectors \(p_i\) of all i PF particles clustered into a jet:

$$\begin{aligned} m_{\text {jet}} ^2 = \Bigl ( \sum _{i~\text {in jet}} p_i \Bigr )^2 , \end{aligned}$$
(1)

where the pion mass is assigned to all charged hadrons. The reconstruction of \(m_{\text {jet}}\) for CA12 jets is studied by using a sample of highly boosted \(\mathrm {W}\rightarrow \mathrm{q} \overline{\mathrm{q}} '\) decays merged into a single jet, as described in Sect. 5.5.

4 Trigger and data

The data were recorded by using single-lepton triggers with no isolation requirement applied to the leptons. Events in the muon+jets channel use a trigger that requires at least one muon with \(p_{\mathrm{T}} > 40\) \(\,\text {GeV}\) and \(|\eta |<2.1\). The efficiency for this trigger, measured in a \({\mathrm{Z}} \rightarrow \mu ^+\mu ^-\) sample, is 95% for muons measured within \(|\eta |<0.9\), 85% for muons within \(0.9<|\eta |<1.2\), and 83% for \(1.2<|\eta |<2.1\).

The trigger for the electron+jets channel requires at least one electron with \(p_{\mathrm{T}} > 30\) \(\,\text {GeV}\) in conjunction with two AK5 jets that have \(p_{\mathrm{T}} >100\) and \(> 25\) \(\,\text {GeV}\), for the leading and next-to-leading AK5 jet, respectively. Events are also included if triggered by a single AK5 jet with \(p_{\mathrm{T}} >320\,\text {GeV} \). The additional events obtained through this single-jet trigger often contain an electron merged into a jet that cannot be resolved at the trigger stage. The resulting combined trigger efficiency is 90% for events with a leading AK5 jet with \(p_{\mathrm{T}} <320\,\text {GeV} \). Above this value, the trigger has a turn-on behaviour and is fully efficient above a value of \(350\,\text {GeV} \). The trigger efficiencies are measured in data and simulation using a tag-and-probe method in \({\mathrm{Z}}\)/\(\gamma ^*(\rightarrow \ell \ell )\)+jets and dileptonic \({\mathrm{t}}\overline{\mathrm{t}}\) events. Small differences between data and simulation are corrected for by applying scale factors to the simulated events.

Top quark events, produced via the strong and electroweak interactions, are simulated with the next-to-leading-order (NLO) generator powheg 1.380 [68,69,70,71,72] with a value of \(m_{{\mathrm{t}}} =172.5\,\text {GeV} \). The \(\mathrm {W}(\rightarrow \ell \nu )\)+jets and \({\mathrm{Z}}/\gamma ^*(\rightarrow \ell \ell )\)+jets processes are simulated with MadGraph 5.1.5.11 [73], where Madspin [74] is used for the decay of heavy resonances. Diboson production processes (\(\mathrm {W}\) \(\mathrm {W}\), \(\mathrm {W}\) \({\mathrm{Z}}\), and \({\mathrm{Z}} {\mathrm{Z}} \)) are simulated with pythia  6.424 [75]. Simulated multijet samples are generated in MadGraph, but constitute a negligible background. For the estimation of systematic uncertainties, additional \({\mathrm{t}}\overline{\mathrm{t}}\) samples are generated with mc@nlo v3.41 [76] or with MadGraph for seven values of \(m_{{\mathrm{t}}} \) ranging from 166.5 to 178.5\(\,\text {GeV}\).

All the samples generated in MadGraph and powheg are interfaced with pythia  6 for parton showering and fragmentation (referred to as MadGraph +pythia and powheg +pythia, respectively). The MLM algorithm [77] used in MadGraph is applied during the parton matching to avoid double counting of parton configurations. The MadGraph samples use the CTEQ6L [78] parton distribution functions (PDFs). The powheg \({\mathrm{t}}\overline{\mathrm{t}}\) sample uses the CT10 [79] PDFs, whereas the single top quark processes use the CTEQ6M [80] PDFs. The pythia 6 Z2* tune [81, 82] is used to model the underlying event. Top quark events produced with mc@nlo use the CTEQ6M PDF set and herwig 6.520 [83] for parton showering and fragmentation (mc@nlo +herwig). The default herwig tune is used to model the underlying event.

The normalisations of the simulated event samples are taken from the NLO calculations of their cross sections that contain the next-to-next-to-leading-logarithm (NNLL) soft-gluon resummations for single top quark production [84], the next-to-next-to-leading-order (NNLO) calculations for \(\mathrm {W}(\rightarrow \ell \nu )\)+jets and \({\mathrm{Z}}/\gamma ^*(\rightarrow \ell \ell )\)+jets [85,86,87], and the NLO calculation for diboson production [88]. The normalisation of the \({\mathrm{t}}\overline{\mathrm{t}}\) simulation is obtained from QCD NNLO calculations, again including resummation of NNLL soft-gluon terms [89,90,91,92,93,94,95].

A detailed simulation of particle propagation through the CMS apparatus and detector response is performed with Geant4 v9.2 [96]. For all simulated samples, the hard collision is overlaid with simulated minimum-bias collisions. The resulting events are weighted to reproduce the pileup distribution measured in data. The same event reconstruction software is used for data and simulated events. The resolutions and efficiencies for reconstructed objects are corrected to match those measured in data [56, 58, 64, 65, 97].

5 Cross section measurement

5.1 Strategy

The measurement is carried out in the \(\ell \)+jets channel, which allows the selection of a pure \({\mathrm{t}}\overline{\mathrm{t}}\) sample because of its distinct signature at large top quark boosts. The measurement is based on choosing kinematic quantities that do not bias the \(m_{\text {jet}}\) distribution from fully hadronic top quark decays. A bias would be introduced by, e.g. selecting the leading jet based on the number of subjets, or requiring a certain maximum value of the N-subjettiness [98, 99], as applied in common top quark tagging algorithms [100,101,102,103,104]. Such a selection would lead to a distinct three-prong structure of the jet and thus reject events with one quark being soft or collinear with respect to the momentum of the top quark decay.

The fiducial region chosen for this investigation is studied through simulations at the particle level (defined by all particles with lifetimes longer than \(10^{-8}\) s). The exact selection is detailed below. It relies on having a highly boosted semileptonic top quark decay, where the lepton from \(\mathrm {W}\rightarrow \ell \nu _\ell \) is close in \(\Delta R\) to the jet from the hadronisation of the accompanying \(\mathrm{b} \) quark (\(\mathrm{b} \) jet). A second high-\(p_{\mathrm{T}}\) jet is selected, which is assumed to originate from the fully hadronic top quark decay. A veto on additional jets is employed, which ensures that the fully hadronic decay is merged into a single jet. The jet veto is also beneficial for calculating higher-order terms, as it suppresses the size of nonglobal logarithms [105], which appear because of the sensitivity of the jet mass to radiation in only a part of the phase space [106]. The event selection at the reconstruction level is chosen to ensure high efficiency while reducing non-\({\mathrm{t}}\overline{\mathrm{t}}\) backgrounds. Finally, the \(m_{\text {jet}}\) distribution is unfolded for experimental effects and then compared to different MC predictions at the particle level. A measurement of the normalised \(m_{\text {jet}}\) distribution is performed as well, where the normalisation is performed by using the total measured \({\mathrm{t}}\overline{\mathrm{t}}\) cross section in the fiducial phase-space region.

5.2 Definition of the fiducial phase space

The \({\mathrm{t}}\overline{\mathrm{t}}\) cross section as a function of the mass of the leading jet is unfolded to the particle level, correcting for experimental effects, with the fiducial phase space at the particle level defined through the selection described below.

As mentioned previously, the measurement is performed in the \(\ell \)+jets channel, where \(\ell \) refers to an electron or muon from the \(\mathrm {W}\) boson decay. The \(\tau \) lepton decays are not considered as part of the signal. Leptons are required to be within \(|\eta | < 2.1\) and have \(p_{\mathrm{T}} >45\,\text {GeV} \). Jets are clustered by using the CA algorithm with a distance parameter \(R = 1.2\) and required to have \(|\eta | < 2.5\). The value of R is chosen to optimize the relationship between obtaining a sufficient number of events and maintaining a narrow width in the jet-mass distribution. The four-momentum of the leading lepton is subtracted from the four-momentum of a jet if the lepton is found within an angular range of \({\Delta R < 1.2}\) of the jet axis. Events are selected if at least one jet has \(p_{\mathrm {T,1}} >400\,\text {GeV} \) and a second jet has \(p_{\mathrm {T,2}} >150\,\text {GeV} \). The leading jet in \(p_{\mathrm{T}}\) is assumed to originate from the \({\mathrm{t}} \rightarrow \mathrm {W}\mathrm{b} \rightarrow \mathrm{q} \overline{\mathrm{q}} ^\prime \mathrm{b} \) decay, merged into a single jet. Consequently, the second jet is considered to originate from the fragmented \(\mathrm{b} \) quark of the semileptonic top quark decay. To select events with a highly boosted topology, a veto is employed on additional jets with \(p_{\text {T,veto}} >150\,\text {GeV} \). The jet veto removes about 16% of the signal events, but increases the fraction of fully merged top quark decays to about 40%, where an event is called fully merged if the maximum distance in \(\Delta R\) between the leading jet at the particle level and each individual parton from the fully hadronic top quark decay is smaller than 1.2.

Two additional selection criteria are introduced to ensure that the leading jet includes all particles from the fully hadronic top quark decay. The angular difference \(\Delta R(\ell , \text {j}_2)\) between the lepton and the second jet has to be smaller than 1.2. This, together with the veto on additional jets, ensures that the top quarks are produced back-to-back in the transverse plane. In addition, the invariant mass of the leading jet has to be greater than the invariant mass of the combination of the second jet and the lepton, \(m_{\text {jet},1} > m_{\text {jet},2+\ell }\). This improves the choice of the leading jet as originating from the fully hadronic top quark decay.

Fig. 1
figure 1

Simulated mass distributions of the leading jet in \({\mathrm{t}}\overline{\mathrm{t}}\) events for the \(\ell \)+jets channel at the particle level. The events are generated with powheg +pythia, and normalised to the integrated luminosity of the data. The distribution for the total number of selected events (dark solid line) is compared to events where the leading jet originates from the fully hadronic top quark decay (light solid line, “fully merged”), and to events where the leading jet does not include all the remnants (dotted line, “not merged”) from the fully hadronic top quark decay

The simulated distribution of the jet mass at the particle level after this selection is shown in Fig. 1. The distribution of all jets passing the particle-level selection is compared to distributions in jet mass from fully merged and not merged \({\mathrm{t}}\overline{\mathrm{t}}\) decays. After the selection outlined above, jets that do not originate from fully merged top quark decays with a fully hadronic final state are expected to constitute about 35% of all jets in the final data sample, as determined by using the powheg +pythia simulation.

5.3 Selection of events at the reconstruction level

A selection is applied at the reconstruction level to obtain an enriched \({\mathrm{t}}\overline{\mathrm{t}}\) sample with high-\(p_{\mathrm{T}}\) top quarks, based on leptons without an isolation requirement. As a second step, high-\(p_{\mathrm{T}}\) jets are required to be kinematically similar to those selected at the particle level. Comparable kinematic properties between the reconstruction and particle levels lead to small bin-to-bin migrations and therefore to small corrections when unfolding the data.

Selected events must contain exactly one muon or electron with \(p_{\mathrm{T}} >45\,\text {GeV} \) and \(|\eta |<2.1\). Events with more than one lepton are vetoed to suppress contributions from dileptonic \({\mathrm{t}}\overline{\mathrm{t}}\) decays. To select highly boosted \({\mathrm{t}}\overline{\mathrm{t}}\) events, at least one AK5 jet is required to have \(p_{\mathrm{T}} >150\,\text {GeV} \) and another AK5 jet \(p_{\mathrm{T}} > 50\,\text {GeV} \), where both jets have to fulfil \(|\eta |<2.4\). The suppression of background from multijet production is accomplished by using a two-dimensional (2D) isolation variable that is efficient at large top quark boosts, yet notably reduces multijet background. This 2D isolation requires the angular difference between the lepton and the nearest AK5 jet directions \(\Delta R_{\text {min}}(\text {lepton, jets})\) to be greater than 0.5, or the perpendicular component of the lepton momentum relative to the nearest AK5 jet \(p_{\text {rel,T}} \) to be larger than \(25\,\text {GeV} \). In the calculation of these quantities, only AK5 jets with \(p_{\mathrm{T}} >25\,\text {GeV} \) are considered. The efficiency of the 2D isolation requirement has been studied in data and simulation by using \({\mathrm{Z}}/\gamma ^*(\rightarrow \ell \ell )\)+jets events [26].

A requirement on \(p_{\mathrm{T}} ^\text {miss} >20\,\text {GeV} \) and on the scalar sum \(p_{\mathrm{T}} ^\text {miss} +p_{\mathrm{T}} ^\ell > 150\,\text {GeV} \) reduces the contribution from multijet and \({\mathrm{Z}}/\gamma ^*(\rightarrow \ell \ell )\)+jets production, where \(p_{\mathrm{T}} ^\ell \) is the lepton transverse momentum. Given the presence of two \(\mathrm{b} \) quarks in the events, at least one AK5 jet is required to be identified as originating from the fragmentation of a \(\mathrm{b} \) quark by using the CSV algorithm, which reduces the contribution from \(\mathrm {W}\)+jets production. The electron channel includes an additional topological selection criterion to suppress the remaining residual contribution from multijet production:

$$\begin{aligned} | \Delta \phi ( \{ \mathrm {e}\,\text {or}\, \text {jet} \} , \,{\mathbf {p}}_{\mathrm {T}}^{\text {miss}}) - 1.5 | < p_{\mathrm{T}} ^\text {miss}/ 50\,\text {GeV}, \end{aligned}$$

with \(\Delta \phi \) measured in radians and \(p_{\mathrm{T}} ^\text {miss}\) in \(\,\text {GeV}\). This criterion rejects events in which \({\mathbf {p}}_{\mathrm {T}}^{\text {miss}} \) points along the transverse momentum vector of the leading jet or the lepton. After these requirements, the background contribution from multijet production is negligible.

The selection procedure outlined above results in a \({\mathrm{t}}\overline{\mathrm{t}}\) sample with high purity and selection efficiency at large top quark \(p_{\mathrm{T}}\). In addition, events are selected with kinematic requirements similar to those at the particle level. For each event to pass the selection, at least one jet is required with \(p_{\mathrm{T}} >400\,\text {GeV} \) and another with \(p_{\mathrm{T}} > 150\,\text {GeV} \), where both jets have to fulfil \(|\eta |<2.5\). Contributions from not fully merged \({\mathrm{t}}\overline{\mathrm{t}}\) events are suppressed with a veto on additional jets with transverse momentum \(p_{\mathrm{T}} > 150\,\text {GeV} \) and \(|\eta | < 2.5\). The jet veto has an efficiency of \(93\%\) for fully-merged signal events. The fraction of fully merged events with a back-to-back topology is further enhanced by selecting events with an angular difference \(\Delta R(\ell , \text {j}_2)<1.2\) between the directions of the lepton and the subleading jet. To ensure that the leading jet originates from the fully merged top quark decay, its invariant mass is required to be larger than the mass of the subleading jet. With these selection criteria, the reconstruction efficiency for \({\mathrm{t}}\overline{\mathrm{t}}\) events where one top quark decays semileptonically in the fiducial region of the measurement is 23.2%. Several of the above criteria are relaxed in the unfolding procedure to define sideband regions included as additional bins in the response matrix, increasing thereby the reconstruction efficiency.

After the selection procedure, the contribution of non-signal \({\mathrm{t}}\overline{\mathrm{t}}\) events from \({\mathrm{t}}\overline{\mathrm{t}}\) decays to the \(\tau \)+jets, dilepton, and all-jets channels constitute, respectively, 7.3, 11.6, and \(0.4\%\) of the selected events. These contributions are accounted for in the unfolding.

The distributions in \(p_{\mathrm{T}} \) and \(\eta \) for the leading jet in selected events are shown in Fig. 2 from data and simulation. The mass distribution of the leading jet at the reconstruction level is shown in Fig. 3 for the \(p_{\mathrm{T}}\) regions of \(400< p_{\mathrm{T}} < 500\,\text {GeV} \) (upper) and \(p_{\mathrm{T}} > 500\,\text {GeV} \) (lower). In these plots the \({\mathrm{t}}\overline{\mathrm{t}} \) simulation is scaled such that the number of simulated events matches the number of selected events observed in data. Overall good agreement between data and the predictions is observed. The slight slope in the data/MC ratio of the jet mass distribution in Fig. 3 (upper) is covered by the jet energy and mass scale uncertainties, as described below.

Fig. 2
figure 2

Distributions of \(p_{\mathrm{T}} \) (upper) and \(\eta \) (lower) of the leading jet from data (points) and simulation (filled histograms). The vertical bars on the points show the statistical uncertainty and the horizontal bars show the bin widths. The electron and muon channels are shown combined. The hatched region shows the total uncertainty in the simulation, including the statistical and experimental systematic uncertainties. The panels below show the ratio of the data to the simulation. The uncertainty bands include the statistical and experimental systematic uncertainties, where the statistical (light grey) and total (dark grey) uncertainties are shown separately in the ratio

Fig. 3
figure 3

Distributions of the leading-jet invariant mass from data (points) and simulation (filled histograms). The vertical bars on the points show the statistical uncertainty and the horizontal bars show the bin widths for the combined electron and muon channels. The distributions for \(p_{\mathrm{T}} \) bins of \(400< p_{\mathrm{T}} < 500\,\text {GeV} \) (upper) and \(p_{\mathrm{T}} > 500\,\text {GeV} \) (lower) are given. The hatched region shows the total uncertainty in the simulation, including the statistical and experimental systematic uncertainties. The panels below show the ratio of the data to the simulation. The uncertainty bands include the statistical and experimental systematic uncertainties, where the statistical (light grey) and total (dark grey) uncertainties are shown separately in the ratio

Table 1 shows the total number of events observed in data together with the total number of signal and background events determined from simulation.

Table 1 Number of events obtained after applying the full selection. The results are given for the individual sources of background, \({\mathrm{t}}\overline{\mathrm{t}}\) signal, and data. The uncertainties correspond to the statistical and systematic components added in quadrature

5.4 Unfolding from the reconstruction level to the particle level

The transformation from the reconstruction to the particle level is carried out through a regularised unfolding based on a least-squares fit, implemented in the TUnfold [107] framework. This procedure suppresses the statistical fluctuations by a regularisation with respect to the count in each bin. The optimal regularisation strength is determined through a minimization of the average global correlation coefficient of the output bins [108]. Contributions from background processes such as \(\mathrm {W}\)+jets, single top quark, and multijet production are determined from simulation and subtracted from the data prior to the unfolding. Non-signal \({\mathrm{t}}\overline{\mathrm{t}}\) events are accounted for in the unfolding by including them in the response matrix, described below.

The response matrix is evaluated by using \({\mathrm{t}}\overline{\mathrm{t}} \) events simulated with \({\textsc {powheg}} {}+{\textsc {pythia}} \). It is obtained for the two regions in the leading-jet \(p_{\mathrm{T}}\) of \(400< p_{\mathrm{T}} < 500\,\text {GeV} \) and \(p_{\mathrm{T}} >500\,\text {GeV} \). This division is needed to account for the distribution of the \(p_{\mathrm{T}} \) spectrum. The response matrix includes three additional sideband regions to account for migrations in and out of the phase-space region of the measurement. These are obtained for a lower leading-jet \(p_{\mathrm{T}} \) of \( 300< p_{\mathrm{T}} < 400 \,\text {GeV} \), a lower second-leading-jet \(p_{\mathrm{T}} \) of \(100< p_{\mathrm{T}} < 150 \,\text {GeV} \), and a higher veto-jet \(p_{\mathrm{T}} \) of \(150< p_{\mathrm{T}} < 200 \,\text {GeV} \). Events that are reconstructed, but do not pass the particle-level selections, are also included in the response matrix. The electron and muon channels are combined, and the combined distribution is unfolded to ensure a sufficient number of events in the unfolding procedure. The electron and muon channels are also unfolded separately, and the results are compared to verify their consistency.

5.5 Uncertainties

5.5.1 Statistical uncertainties

Statistical uncertainties in the unfolding procedure arise from three sources. The dominant source reflects the statistical fluctuations in the input data. Second are the uncertainties from the finite number of simulated events used to calculate the response matrix. The third source reflects the statistical uncertainties in the simulation of the background processes. After the unfolding, a total statistical uncertainty is obtained for each bin of the \(m_{\text {jet}}\) distribution that includes the effects from all three sources, which are correlated among the individual measurement bins.

5.5.2 Experimental systematic uncertainties

Systematic uncertainties related to experimental effects are evaluated by changing calibration factors and corrections to efficiencies within their corresponding uncertainties. The resulting covariance matrix of the unfolded measurement is computed through standard error propagation. The uncertainties are evaluated by unfolding pseudo-data simulated with MadGraph +pythia. Pseudo-data are preferred over data because of the smaller statistical fluctuations in the estimation of the systematic uncertainties. The change in each parameter that yields the largest variation in the unfolded measurement is taken as the uncertainty owing to that parameter. The following sources of experimental systematic uncertainties are considered.

The applied jet energy corrections (JEC) depend on the \(p_{\mathrm{T}} \) and \(\eta \) of the individual jets. The JEC are obtained by using anti-\(k_{\mathrm {T}}\) jets with \(R=0.7\) (AK7) [64], and their use is checked on CA12 jets by using simulated events. Residual differences between generated and reconstructed jet momenta caused by the larger jet size used in this analysis result in increased uncertainties in the JEC by factors of two to four with respect to the AK7 values. Changes of the JEC within their uncertainties are made in the three-momenta of the jets to estimate the effect on the measured cross section. The jet mass is kept fixed to avoid double-counting of uncertainties when including the uncertainty in the jet-mass scale. A smearing is applied in the jet energy resolution (JER) as an \(\eta \)-dependent correction to all jets in the simulation. The corrections are again changed within their uncertainty to estimate the systematic uncertainty related to the JER smearing. The uncertainties are found to be small compared to the ones from the JEC. The jet-mass scale and the corresponding uncertainty in the CA12 jets have been studied in events that contain a \(\mathrm {W}\rightarrow \mathrm{q} \overline{\mathrm{q}} '\) decay reconstructed as a single jet in \({\mathrm{t}}\overline{\mathrm{t}}\) production. The ratio of the reconstructed jet-mass peak positions in data and simulation is \(1.015 \pm 0.012\). No correction to the jet-mass scale is applied, but an uncertainty of \(1.5\%\) is assigned, corresponding to the difference in peak positions. The widths of the jet mass distributions are about 15\(\,\text {GeV}\), consistent between data and simulation.

Corrections in \(\mathrm{b} \) tagging efficiency are applied as \(p_{\mathrm{T}} \)-dependent scale factors for each jet flavour. The corresponding systematic uncertainties are obtained by changing the scale factors within their uncertainties. Pileup correction factors are applied to match the number of primary interactions to the instantaneous luminosity profile in data. The uncertainty is obtained by changing the total inelastic cross section by \({\pm }5\%\) [109]. Trigger and lepton identification scale factors are used to correct for differences in the lepton selection efficiency between data and simulation. The corresponding uncertainties are computed by changing the scale factors within their uncertainties [56, 58].

5.5.3 Normalisation uncertainties

The effects from uncertainties in background processes are calculated by changing the amount of background subtracted prior to the unfolding and propagating the effect to the output. The uncertainty in the \(\mathrm {W}\)+jets cross section is taken to be 19%, as obtained from a measurement of \(\mathrm {W}\)+heavy-flavour quark production [110]; an uncertainty of 23% is applied to the single top quark cross section [111]; and an uncertainty of 100% is assumed for multijet production, estimated from the comparison of various kinematic distributions between data and simulation. Uncertainties affecting the overall normalisation are added in quadrature to the total uncertainty after the unfolding. An uncertainty of 2.6% is applied subsequently for the integrated luminosity [112].

5.5.4 Modelling uncertainties

The unfolding is checked for its dependence on the simulation of \({\mathrm{t}}\overline{\mathrm{t}}\) production through the use of alternative programs to generate events. The effect on the measurement is estimated by using one simulation as pseudo-data input to the unfolding, and another for the calculation of the response matrix. The unfolded result is then compared to the particle-level distribution from the simulation used as pseudo-data. Differences between the unfolded result and the truth-level distribution are taken as the modelling uncertainties.

The uncertainty from the choice of MC generator is estimated by unfolding pseudo-data simulated with MadGraph +pythia through a response matrix evaluated with powheg +pythia. The effect from the choice of the parton-shower simulation is estimated from events generated with mc@nlo +herwig.

The dependence on the choice of \(m_{{\mathrm{t}}} \) in the simulation used to correct the data is also checked. While the unfolded measurement is largely independent of the choice of \(m_{{\mathrm{t}}} \), residual effects from the kinematic properties of the leptons and jets can lead to additional uncertainties. These uncertainties are evaluated by using events simulated with MadGraph +pythia for seven values of \(m_{{\mathrm{t}}} \) from 166.5 to 178.5\(\,\text {GeV}\), as pseudo-data. This range is considered because no measurement of \(m_{{\mathrm{t}}} \) in this kinematic regime exists, and a stable result, independent of the specific choice of \(m_{{\mathrm{t}}} \), is therefore crucial. For this check, the response matrix is obtained with MadGraph +pythia and a value of \(m_{{\mathrm{t}}} = 172.5\,\text {GeV} \). The envelope of the uncertainty obtained for different values of \(m_{{\mathrm{t}}} \) is used to define an additional modelling uncertainty.

The uncertainty from the uncalculated higher-order terms in the simulation is estimated by changing the choice of the factorisation and renormalisation scales \(\mu _\mathrm {F}\) and \(\mu _\mathrm {R}\). For this purpose events simulated with powheg +pythia are used, where the scales are changed up and down by factors of two relative to their nominal value. This is set to \(\mu _\mathrm {F}^2 = \mu _\mathrm {R}^2 = Q^2\), where the scale of the hard process is defined by \(Q^2 = m_{{\mathrm{t}}} ^2 + \sum p_{\mathrm{T}} ^2\) with the sum over all additional final-state partons in the matrix-element calculation. Events with varied scales are unfolded through a response matrix obtained with the nominal choice of scales. The uncertainty in the measurement is defined by the largest change found in the study.

Uncertainties from the PDF are evaluated by using the eigenvectors of the CT10 PDF set with the powheg +pythia simulation. The resulting differences in the response matrix are propagated to the measurement. The individual uncertainties for each eigenvector are scaled to the 68% confidence level and added in quadrature [79].

5.5.5 Summary of uncertainties

A summary of the relative uncertainties in this measurement is shown in Fig. 4. The largest contribution is from the statistical uncertainties. The experimental systematic uncertainties are far smaller than those from the modelling of \({\mathrm{t}}\overline{\mathrm{t}}\) production. The largest uncertainties are expected to improve considerably with more data at higher centre-of-mass energies. Besides a reduction of the statistical uncertainties, an unfolding of the data using finer bins and as a function of more variables will then be possible, which will result in a reduction of the systematic uncertainties from the simulation of \({\mathrm{t}}\overline{\mathrm{t}}\) events. More data will also allow for a measurement that uses smaller jet sizes, which will reduce the uncertainties coming from the jet energy and jet mass scales.

Fig. 4
figure 4

Statistical uncertainties compared to the individual experimental systematic uncertainties (upper), and statistical uncertainties compared to the systematic uncertainties originating from the modelling of \({\mathrm{t}}\overline{\mathrm{t}}\) production (lower), as a function of the leading-jet mass. The total uncertainties are indicated by the grey cross-hatched regions. The statistical and total uncertainties in the last bin are around 300% and exceed the vertical scale. The size of the horizontal bars represents the bin widths

5.6 Cross section results

Table 2 Summary of the selection criteria used to define the fiducial region of the measurement

The particle-level \({\mathrm{t}}\overline{\mathrm{t}}\) cross section for the fiducial phase-space region is measured differentially as a function of the leading-jet mass in the \(\ell \)+jets channel. The selection criteria defining the fiducial measurement region are summarised in Table 2 (cf. Sect. 5.2).

Fig. 5
figure 5

Fiducial-region particle-level differential \({\mathrm{t}}\overline{\mathrm{t}}\) cross sections as a function of the leading-jet mass. The cross sections from the combined electron and muon channels (points) are compared to predictions from the MadGraph +pythia, powheg +pythia, and mc@nlo +herwig generators (lines). The vertical bars represent the statistical (inner) and the total (outer) uncertainties. The horizontal bars show the bin widths

Table 3 Measured particle-level \({\mathrm{t}}\overline{\mathrm{t}}\) differential cross sections in the fiducial region as a function of \(m_{\text {jet}}\), with the individual and total uncertainties in percent

The measured differential cross section as a function of the leading-jet mass in this fiducial region is shown in Fig. 5, and the numerical values are given in Table 3. The full covariance matrices are given in Appendix A. The data are compared to simulated distributions obtained with powheg +pythia, MadGraph +pythia, and mc@nlo +herwig. The total measured \({\mathrm{t}}\overline{\mathrm{t}}\) cross section for \(140< m_{\text {jet}} < 350\,\text {GeV} \) in the fiducial region is \(\sigma = 101 \pm 11\,\text {(stat)} \pm 13\,\text {(syst)} \pm 9\,(\text {model})\text {\,fb} \), where the last uncertainty is from the modelling of the \({\mathrm{t}}\overline{\mathrm{t}}\) signal. Combining all the uncertainties in quadrature gives a value of \(\sigma = 101 \pm 19\text {\,fb} \). The predicted fiducial-region cross sections from the MadGraph +pythia and powheg +pythia \({\mathrm{t}}\overline{\mathrm{t}}\) simulations, assuming a total \({\mathrm{t}}\overline{\mathrm{t}}\) cross section of 253 \(\text {\,pb}\)  [89,90,91,92,93,94,95], are \(159\,^{+17}_{-18}\) and \(133\,^{ +18}_{ -28}\text {\,fb} \), respectively, where the uncertainties are systematic and come from the variations of \(\mu _\mathrm {R}\) and \(\mu _\mathrm {F}\). The predictions exceed the measurements, consistent with previously measured \({\mathrm{t}}\overline{\mathrm{t}}\) cross sections at large top quark \(p_{\mathrm{T}}\)  [16, 17]. A similar trend is observed when comparing the data to the prediction from mc@nlo +herwig. Recent NNLO calculations [113] of the top quark \(p_{\mathrm{T}}\) spectrum alleviate this discrepancy.

The normalised differential cross section \((1/\sigma ) (\mathrm{d}\sigma /\mathrm{d}m_{\text {jet}} {})\) is obtained by dividing the differential cross sections by the total cross section in the \(m_{\text {jet}}\) range from 140 to 350\(\,\text {GeV}\). The result is shown in Fig. 6, together with the predictions of MadGraph +pythia for three values of \(m_{{\mathrm{t}}} \). The numerical values of the measured particle-level cross sections are given in Table 4, together with the individual and total uncertainties. The covariance matrices of the measurement are given in Appendix A. The data are well described by the simulation, showing that the overall modelling of the top quark jet mass is acceptable, once the disagreement with the total cross section at large \(p_{\mathrm{T}}\) is eliminated by the normalisation. The sensitivity of the measurement to \(m_{{\mathrm{t}}}\) is clearly visible, albeit compromised by the overall uncertainties.

Fig. 6
figure 6

The normalised particle-level \({\mathrm{t}}\overline{\mathrm{t}}\) differential cross section in the fiducial region as a function of the leading-jet mass. The measurement is compared to predictions from MadGraph +pythia for three values of \(m_{{\mathrm{t}}}\). The vertical bars represent the statistical (inner) and the total (outer) uncertainties. The horizontal bars show the bin widths

Table 4 Values of the particle-level \({\mathrm{t}}\overline{\mathrm{t}}\) differential cross section in the fiducial region, normalized to unity, as a function of the leading-jet mass. The individual and total uncertainties are given in percent

6 Sensitivity to the top quark mass

Calculations of \(m_{\text {jet}}\) for \({\mathrm{t}}\overline{\mathrm{t}}\) production from first principles, by using a well-defined definition of \(m_{{\mathrm{t}}}\) and not relying on parton shower and hadronisation models, are not yet available for the LHC. Still, a determination of the top quark mass parameter in general-purpose event generators that uses the normalised particle-level cross sections provides a proof of principle for the feasibility of the method, a cross-check on other determinations of \(m_{{\mathrm{t}}}\), and an estimate of the current measurement’s sensitivity. The value of \(m_{{\mathrm{t}}}\) is determined from the normalised differential cross section measurements given in Table 4, since only the shape of the \(m_{\text {jet}}\) distribution can be reliably calculated. Correlations are taken into account through the full covariance matrix of the measurement, which is given in Appendix A. Theoretical predictions are obtained from MadGraph +pythia for different values of \(m_{{\mathrm{t}}}\). A fit is performed based on the \(\chi ^2\) evaluated as \(\chi ^2 = d^T V^{-1} d\), where d is the vector of differences between the measured normalised cross sections and the predictions, and V is the covariance matrix, which includes the statistical, experimental systematic, modelling, and theoretical uncertainties. The latter are calculated by changing up and down by factors of two the scales \(\mu _\mathrm {R}\) and \(\mu _\mathrm {F}\) in the MadGraph +pythia simulation. The resulting uncertainties are added to the covariance matrix. The \(\chi ^2\) values obtained for different values of \(m_{{\mathrm{t}}}\) are fitted by a second-order polynomial to determine the minimum, and the uncertainty is determined by a change in \(\chi ^2\) of 1.0. The result is

$$\begin{aligned} m_{{\mathrm{t}}} =&170.8 \pm 6.0\,\text {(stat)} \pm 2.8\,\text {(syst)} \end{aligned}$$
(2)
$$\begin{aligned}&\pm 4.6\,\text {(model)} \pm 4.0\,\text {(theo)} \,\text {GeV} \nonumber \\ =&170.8 \pm 9.0\,\text {GeV}, \end{aligned}$$
(3)
$$\begin{aligned} m_{{\mathrm{t}}}&= 170.8 \pm 6.0\,\text {(stat)} \pm 2.8\,\text {(syst)} \pm 4.6\,\text {(model)} \pm 4.0\,\text {(theo)} \,\text {GeV} \end{aligned}$$
(4)
$$\begin{aligned}&= 170.8 \pm 9.0\,\text {GeV}, \end{aligned}$$
(5)

where the total uncertainty in Eq. (3 5) is the sum in quadrature of the individual uncertainties in Eq. (2 4). The fit has a minimum \(\chi ^2\) of 1.6 for three degrees of freedom. This measurement is the first determination of \(m_{{\mathrm{t}}}\) from boosted \({\mathrm{t}}\overline{\mathrm{t}}\) production, calibrated to the MadGraph +pythia simulation. It is consistent with recent determinations of \(m_{{\mathrm{t}}}\) that use MC event generators [33, 35,36,37], cross section measurements [6, 34, 114], and indirect constraints from electroweak fits [115].

7 Summary and outlook

The first measurement of the differential \({\mathrm{t}}\overline{\mathrm{t}}\) cross section has been performed in the \(\ell \)+jets channel as a function of the leading-jet mass \(m_{\text {jet}}\) in the highly boosted top quark regime. The measurement is carried out in a fiducial region with fully merged top quark decays in hadronic final states, corrected to the particle level. The normalised differential cross section as a function of \(m_{\text {jet}}\) agrees with predictions from simulations, indicating the good quality of modelling the jet mass in highly boosted top quark decays. The total fiducial-region cross section for \(m_{\text {jet}}\) between 140 and 350\(\,\text {GeV}\) is measured to be \(101 \pm 19\text {\,fb} \), which is below the predicted value. This difference is consistent with earlier measurements of a softer top quark \(p_{\mathrm{T}}\) spectrum observed in data than in simulation [16, 17]. This measurement is a first step towards measuring unfolded jet substructure distributions in highly boosted top quark decays. A detailed understanding of these is crucial for measurements and searches for new physics making use of top quark tagging algorithms.

The peak position in the \(m_{\text {jet}}\) distribution is sensitive to the top quark mass \(m_{{\mathrm{t}}}\). This can be used for an independent determination of \(m_{{\mathrm{t}}}\) in the boosted regime, with the prospect of reaching a more reliable correspondence between the top quark mass in any well-defined renormalisation scheme and the top quark mass parameter in general-purpose event generators.

The normalised particle-level \({\mathrm{t}}\overline{\mathrm{t}}\) differential cross section measurement as a function of \(m_{\text {jet}}\) is used to extract a value of \(m_{{\mathrm{t}}}\) in order to estimate the current sensitivity of the data. The value obtained, \(m_{{\mathrm{t}}} = 170.8 \pm 9.0\,\text {GeV} \), is consistent with the current LHC and Tevatron average of \(173.34 \pm 0.27\,\text {(stat)} \pm 0.71\,\text {(syst)} \,\text {GeV} \) [116], albeit with a much larger uncertainty.

New data at higher centre-of-mass energies and with larger integrated luminosities will lead to an improvement in the statistical uncertainty. More data can also lead to reductions in the experimental systematic uncertainties, most notably that from the jet mass scale, which is expected to improve with smaller jet distance parameters. In addition, improvements in the modelling uncertainty are expected because of stronger constraints on the simulation in the highly boosted regime. A reduction in the theoretical uncertainty is also foreseen with the emergence of higher-order calculations. The results obtained in this analysis show the feasibility of the method to obtain the top quark mass in the highly boosted regime. This can provide an important ingredient for studies of the relation between the value of the top quark mass obtained from MC event generators and the one obtained from first-principle calculations.