1 Introduction

In order to enhance the capability of the experiments to discover physics beyond the Standard Model, the Large Hadron Collider (LHC) operates at the conditions yielding the highest integrated luminosity achievable. Therefore, the collisions of proton bunches result not only in large transverse-momentum transfer proton–proton (\(pp\)) interactions, but also in additional collisions within the same bunch crossing, primarily consisting of low-energy quantum chromodynamics (QCD) processes. Such additional \(pp\) collisions are referred to as in-time pile-up interactions. In addition to in-time pile-up, out-of-time pile-up refers to the energy deposits in the ATLAS calorimeter from previous and following bunch crossings with respect to the triggered event. In this paper, in-time and out-of-time pile-up are referred collectively as pile-up (PU).

In Ref. [1] it was shown that pile-up jets can be effectively removed using track and vertex information with the jet-vertex-tagger (\(\mathrm {JVT}\)) technique. The CMS Collaboration employs a pile-up mitigation strategy based on tracks and jet shapes [2]. A limitation of the \(\mathrm {JVT}\) discriminant used by the ATLAS Collaboration is that it can only be used for jets within the coverageFootnote 1 of the tracking detector, \(|\eta |<2.5\). However, in the ATLAS detector, jets are reconstructed in the range \(|\eta |<4.5\). The rejection of pile-up jets in the forward region, here defined as \(2.5<|\eta |<4.5\), is crucial to enhance the sensitivity of key analyses such as the measurement of Higgs boson production in the vector-boson fusion (VBF) process. Figure 1a shows how the fraction of \(Z +\)jets events with at least one forward jetFootnote 2 with \(p_{\text {T}} >20\,\text {GeV}\), an important background for VBF analyses, rises quickly with busier pile-up conditions, quantified by the average number of interactions per bunch crossing (\(\langle \mu \rangle \)). Likewise, the resolution of the missing transverse momentum (\(E_{\text {T}}^{\text {miss}}\)) components \(E_x^{\text {miss}}\) and \(E_y^{\text {miss}}\) in \(Z +\)jets events is also affected by the presence of forward pile-up jets. The inclusion of forward jets allows a more precise \(E_{\text {T}}^{\text {miss}}\) calculation but a more pronounced pile-up dependence, as shown in Fig. 1b. At higher \(\langle \mu \rangle \), improving the \(E_{\text {T}}^{\text {miss}}\) resolution depends on rejecting all forward jets, unless the impact of pile-up jets specifically can be mitigated.

Fig. 1
figure 1

a Fraction of simulated \(Z +\)jets events with at least one forward jet and b the resolution of the \(E_{\text {T}}^{\text {miss}}\) components \(E_x^{\text {miss}}\) and \(E_y^{\text {miss}}\) as a function of \(\langle \mu \rangle \). Jets and \(E_{\text {T}}^{\text {miss}}\) definitions are described in Sect. 2

In this paper, the phenomenology of pile-up jets with \(|\eta |>2.5\) is investigated in detail, and techniques to identify and reject them are presented. The paper is organized as follows. Section 2 briefly describes the ATLAS detector, the event reconstruction and selection. The physical origin and classification of pile-up jets are described in Sect. 3. Section 4 describes the use of jet shape variables for the identification and rejection of forward pile-up jets. The forward \(\mathrm {JVT}\) (\(\mathrm {fJVT}\)) technique is presented in Sect. 5 along with its performance and efficiency measurements. The usage of jet shape variables in improving \(\mathrm {fJVT}\) performance is presented in Sect. 6, while the application of forward pile-up jet rejection in a VBF analysis is discussed in Sect. 7. The conclusions are presented in Sect. 8.

2 Experimental setup

2.1 ATLAS detector

The ATLAS detector is a general-purpose particle detector covering almost \(4\pi \) in solid angle and consisting of a tracking system called the inner detector (ID), a calorimeter system, and a muon spectrometer (MS). The details of the detector are given in Refs. [3,4,5].

The ID consists of silicon pixel and microstrip tracking detectors covering the pseudorapidity range of \(|\eta | < 2.5\) and a straw-tube tracker covering \(|\eta | < 2.0\). These components are immersed in an axial 2 T magnetic field provided by a superconducting solenoid.

The electromagnetic (EM) and hadronic calorimeters are composed of multiple subdetectors covering the range \(|\eta |<4.9\), generally divided into barrel (\(|\eta | < 1.4\)), endcap (\(1.4< |\eta | < 3.2\)) and forward (\(3.2< |\eta | < 4.9\)) regions. The barrel and endcap sections of the EM calorimeter use liquid argon (LAr) as the active medium and lead absorbers. The hadronic endcap calorimeter (\(1.5<|\eta |<3.2\)) uses copper absorbers and LAr, while in the forward (\(3.1<|\eta |<4.9\)) region LAr, copper and tungsten are used. The LAr calorimeter read-out [6], with a pulse length between 60 and 600 ns, is sensitive to signals from the preceding 24 bunch crossings. It uses bipolar shaping with positive and negative output, which ensures that the signal induced by out-of-time pile-up averages to zero. In the region \(|\eta |<1.7\), the hadronic (Tile) calorimeter is constructed from steel absorber and scintillator tiles and is separated into barrel (\(|\eta |<1.0\)) and extended barrel (\(0.8<|\eta |<1.7\)) sections. The fast response of the Tile calorimeter makes it less sensitive to out-of-time pile-up.

The MS forms the outer layer of the ATLAS detector and is dedicated to the detection and measurement of high-energy muons in the region \(|\eta |<2.7\). A multi-level trigger system of dedicated hardware and software filters is used to select \(pp\) collisions producing high-\(p_{\text {T}}\) particles.

2.2 Data and MC samples

The studies presented in this paper are performed using a data set of pp collisions at \(\sqrt{s}=13\,\text {TeV} \), corresponding to an integrated luminosity of 3.2 fb\(^{-1}\), collected in 2015 during which the LHC operated with a bunch spacing of 25 ns. There are on average 13.5 interactions per bunch crossing in the data sample used for the analysis.

Samples of simulated events used for comparisons with data are reweighted to match the distribution of the number of pile-up interactions observed in data. The average number of interactions per bunch crossing \(\langle \mu \rangle \) in the data used as reference for the reweighting is divided by a scale factor of \(1.16\pm 0.07\). This scale factor takes into account the fraction of visible cross-section due to inelastic \(pp\) collisions as measured in the data [7] and is required to obtain good agreement with the number of inelastic interactions reconstructed in the tracking detector as predicted in the reweighted simulation. In order to extend the study of the pile-up dependence, simulated samples with an average of 22 interactions per bunch crossing are also used. Dijet events are simulated with the Pythia8.186  [8] event generator using the NNPDF2.3LO [9] set of parton distribution functions (PDFs) and the parameter values set according to the A14 underlying-event tune [10]. Simulated \({t\bar{t}}\) events are generated with powheg box  v2.0 [11,12,13] using the CT10 PDF set [14]; Pythia6.428  [15] is used for fragmentation and hadronization with the Perugia2012 [16] tune that employs the CTEQ6L1 [17] PDF set. A sample of leptonically decaying Z bosons produced with jets (\(Z (\rightarrow \ell \ell )\)+jets) and VBF \(H\rightarrow \tau \tau \) samples are generated with powheg box v1.0 and Pythia8.186 is used for fragmentation and hadronization with the AZNLO tune [18] and the CTEQ6L1 PDF set. For all samples, the EvtGen v1.2.0 program [19] is used for properties of the bottom and charm hadron decays. The effect of in-time as well as out-of-time pile-up is simulated using minimum-bias events generated with Pythia8.186 to reflect the pile-up conditions during the 2015 data-taking period, using the A2 tune [20] and the MSTW2008LO [21] PDF set. All generated events are processed with a detailed simulation of the ATLAS detector response [22] based on Geant4  [23] and subsequently reconstructed and analysed in the same way as the data.

2.3 Event reconstruction

The raw data collected by the ATLAS detector is reconstructed in the form of particle candidates and jets using various pattern recognition algorithms. The reconstruction used in this analysis are detailed in Ref. [1], while an overview is presented in this section.

Calorimeter clusters and towers

Jets in ATLAS are reconstructed from clusters of energy deposits in the calorimeters. Two methods of combining calorimeter cell information are considered in this paper: topological clusters and towers.

Topological clusters (topo-clusters) [24] are built from neighbouring calorimeter cells. The algorithm uses as seeds calorimeter cells with energy significanceFootnote 3 \(|E_\mathrm {cell}|/\sigma _\mathrm {noise}>4\), combines all neighbouring cells with \(|E_\mathrm {cell}|/\sigma _\mathrm {noise}>2\) and finally adds neighbouring cells without any significance requirement. Topo-clusters are used as input for jet reconstruction.

Calorimeter towers are fixed-size objects (\(\Delta \eta \times \Delta \phi =0.1\times 0.1\)) [26] that ensure a uniform segmentation of the calorimeter information. Instead of building clusters, the cells are projected onto a fixed grid in \(\eta \) and \(\phi \) corresponding to 6400 towers. Calorimeter cells which completely fit within a tower contribute their total energy to the single tower. Other cells extending beyond the tower boundary contribute to multiple towers, depending on the overlap fraction of the cell area with the towers. In the following, towers are matched geometrically to jets reconstructed using topo-clusters and are used for jet classification.

Vertices and tracks

The event hard-scatter primary vertex is defined as the reconstructed primary vertex with the largest \(\sum p_\mathrm {T}^2\) of constituent tracks. When evaluating performance in simulation, only events where the reconstructed hard-scatter primary vertex lies \(|\Delta z|<0.1\) mm from the true hard-scatter interaction are considered. For the physics processes considered, the reconstructed hard-scatter primary vertex matches the true hard-scatter interaction more than 95% of the time. Tracks are required to have \(p_{\text {T}} > 0.5\,\text {GeV}\) and to satisfy quality criteria designed to reject poorly measured or fake tracks [27]. Tracks are assigned to primary vertices based on the track-to-vertex matching resulting from the vertex reconstruction. Tracks not included in vertex reconstruction are assigned to the nearest vertex based on the distance \(|\Delta z \times \sin \theta |\), up to a maximum distance of 3.0 mm. Tracks not matched to any vertex are not considered. Tracks are then assigned to jets by adding them to the jet clustering process with infinitesimal \(p_{\text {T}}\) , a procedure known as ghost-association [28].

Jets

Jets are reconstructed from topo-clusters at the EM scaleFootnote 4 using the anti-\(k_t\) [29] algorithm, as implemented in Fastjet 2.4.3  [30], with a radius parameter \(R=0.4\). After a jet-area-based subtraction of pile-up energy, a response correction is applied to each jet reconstructed in the calorimeter to calibrate it to the particle-level jet energy scale [1, 25, 31]. Unless noted otherwise, jets are required to have \(20\,\text {GeV}< p_{\text {T}} < 50\,\text {GeV}\). Higher-\(p_{\text {T}}\) forward jets are ignored due to their negligible pile-up rate at the pile-up conditions considered in this paper. Central jets are required to be within \(|\eta |\) of 2.5 so that most of their charged particles are within the tracking coverage of the inner detector. Forward jets are those in the region \(2.5<|\eta |<4.5\), and no tracks associated with their charged particles are measured beyond \(|\eta |=2.5\).

Jets built from particles in the Monte Carlo generator’s event record (“truth particles”) are also considered. Truth-particle jets are reconstructed using the anti-\(k_t\) algorithm with \(R=0.4\) from stableFootnote 5 final-state truth particles from the simulated hard-scatter (truth-particle hard-scatter jets) or in-time pile-up (truth-particle pile-up jets) interaction of choice. A third type of truth-particle jet (inclusive truth-particle jets) is reconstructed by considering truth particles from all interactions simultaneously, in order to study the effects of pile-up interactions on truth-particle pile-up jets.

The simulation studies in this paper require a classification of the reconstructed jets into three categories: hard-scatter jets, QCD pile-up jets, and stochastic pile-up jets. Jets are thus truth-labelled based on a matching criterion to truth-particle jets. Similarly to Ref. [1], jets are first classified as hard-scatter or pile-up jets. Jets are labelled as hard-scatter jets if a truth-particle hard-scatter jet with \(p_{\text {T}} > 10\,\text {GeV}\) is found within \(\Delta R = \sqrt{(\Delta \eta )^2 + (\Delta \phi )^2}\) of 0.3. The \(p_{\text {T}} >10\,\text {GeV}\) requirement is used to avoid accidental matches of reconstructed jets with soft activity from the hard-scatter interaction. In cases where more than one truth-particle jet is matched, \(p_{\text {T}} ^\mathrm {truth}\) is defined from the highest-\(p_{\text {T}}\) truth-particle hard-scatter jet within \(\Delta R\) of 0.3.

Jets are labelled as pile-up jets if no truth-particle hard-scatter jet with \(p_{\text {T}} > 4\,\text {GeV}\) is found within \(\Delta R\) of 0.6. These pile-up jets are further classified as QCD pile-up if they are matched within \(\Delta R<0.3\) to a truth-particle pile-up jet or as stochastic pile-up jets if there is no truth-particle pile-up jet within \(\Delta R<0.6\), requiring that truth-particle pile-up jets have \(p_{\text {T}} > 10\,\text {GeV}\) in both cases. Jets with \(0.3<\Delta R<0.6\) relative to truth-particle hard-scatter jets with \(p_{\text {T}} > 10\,\text {GeV}\) or \(\Delta R<0.3\) of truth-particle hard-scatter jets with \(4\,\text {GeV}< p_{\text {T}} < 10\,\text {GeV}\) are not labelled because their nature cannot be unambiguously determined. These jets are therefore not used for performance based on simulation.

Jet Vertex Tagger

The Jet Vertex Tagger (JVT) is built out of the combination of two jet variables, \(\mathrm {corrJVF}\) and \(R_\mathrm {pT} ^0\), that provide information to separate hard-scatter jets from pile-up jets. The quantity \(\mathrm {corrJVF}\)  [1] is defined for each jet as

$$\begin{aligned} \mathrm {corrJVF} = \frac{\sum {p_{\text {T}} ^{\mathrm {trk}}(\mathrm {PV}_0)}}{\sum p_{\text {T}} ^{\mathrm {trk}}(\mathrm {PV}_0) + \frac{p_{\text {T}} ^{\mathrm {PU}}}{ (k \cdot n_\mathrm {trk}^\mathrm {PU})}}, \end{aligned}$$
(1)

where PV\(_i\) denotes the reconstructed event vertices (PV\(_0\) is the identified hard-scatter vertex and the PV\(_i\) are sorted by decreasing \(\sum p_\mathrm {T}^2\)), and \(\sum {p_{\text {T}} ^{\mathrm {trk}}(\mathrm {PV}_0)}\) is the scalar \(p_{\text {T}}\) sum of the tracks that are associated with the jet and originate from the hard-scatter vertex. The term \(p_{\text {T}} ^{\mathrm {PU}}=\sum _{i\ge 1}\sum p_{\text {T}} ^{\mathrm {trk}}(\mathrm {PV}_i)\) denotes the scalar \(p_{\text {T}}\) sum of the tracks associated with the jet and originating from pile-up vertices. To correct for the linear increase of \(p_{\text {T}} ^{\mathrm {PU}}\) with the total number of pile-up tracks per event (\(n_\mathrm {trk}^\mathrm {PU}\)), \(p_{\text {T}} ^{\mathrm {PU}}\) is divided by \((k \cdot n_\mathrm {trk}^\mathrm {PU})\) with the parameter k set to 0.01 [1].Footnote 6

The variable \(R_\mathrm {pT} ^0\) is defined as the scalar \(p_{\text {T}}\) sum of the tracks that are associated with the jet and originate from the hard-scatter vertex divided by the fully calibrated jet \(p_{\text {T}} \), which includes pile-up subtraction:

$$\begin{aligned} R_\mathrm {pT} ^0 = \frac{\sum {p_{\text {T}} ^{\mathrm {trk}}(\mathrm {PV}_0)}}{p_{\text {T}} ^\mathrm {jet}}. \end{aligned}$$
(2)

This observable tests the compatibility between the jet \(p_{\text {T}}\) and the total \(p_{\text {T}}\) of the hard-scatter charged particles within the jet. Its average value for hard-scatter jets is approximately 0.5, as the numerator does not account for the neutral particles in the jet. The \(\mathrm {JVT}\) discriminant is built by defining a two-dimensional likelihood based on a k-nearest neighbour (kNN) algorithm [32]. An extension of the \(R_\mathrm {pT} ^0\) variable computed with respect to any vertex i in the event, \(R_\mathrm {pT} ^i=\sum _k{p_{\text {T}} ^{\mathrm {trk}_k}(\mathrm {PV}_i)}/p_{\text {T}} ^\mathrm {jet}\), is also used in this analysis.

Electrons and muons Electrons are built from EM clusters and associated ID tracks. They are required to satisfy \(|\eta |<2.47\) and \(p_\mathrm {T}>10\,\text {GeV}\), as well as reconstruction quality and isolation criteria [33]. Muons are built from an ID track (for \(|\eta |<2.5\)) and an MS track. Muons are required to satisfy \(p_\mathrm {T}>10\,\text {GeV}\) as well as reconstruction quality and isolation criteria [34]. Correction factors are applied to simulated events to account for mismodelling of lepton isolation, trigger efficiency, and quality selection variables.

\(E_{\text {T}}^{\text {miss}}\) The missing transverse momentum, \(\varvec{E}_{\text {T}}^{\text {miss}}\), corresponds to the negative vector sum of the transverse momenta of selected electron, photon, and muon candidates, as well as jets and tracks not used in reconstruction [35]. The scalar magnitude \(E_{\text {T}}^{\text {miss}}\) represents the total transverse momentum imbalance in an event.

3 Origin and structure of pile-up jets

The additional transverse energy from pile-up interactions contributing to jets originating from the hard-scatter (HS) interaction is subtracted on an event-by-event basis using the jet-area method [1, 36]. However, the jet-area subtraction assumes a uniform pile-up distribution across the calorimeter, while local fluctuations of pile-up can cause additional jets to be reconstructed. The additional jets can be classified into two categories: QCD pile-up jets, where the particles in the jet stem mostly from a single QCD process occuring in a single pile-up interaction, and stochastic jets, which combine particles from different interactions. Figure 2 shows an event with a hard-scatter jet, a QCD pile-up jet and a stochastic pile-up jet. Most of the particles associated with the hard-scatter jet originate from the primary interaction. Most of the particles associated with the QCD pile-up jet originate from a single pile-up interaction. The stochastic pile-up jet includes particles associated with both pile-up interactions in the event, without a single prevalent source.

Fig. 2
figure 2

Display of a simulated event in rz view containing a hard-scatter jet, a QCD pile-up jet, and a stochastic pile-up jet. The \(\Delta R_\mathrm {pT} \) values (defined in Sect. 5.1) are quoted for the two pile-up jets

While this binary classification is convenient for the purpose of description, the boundary between the two categories is somewhat arbitrary. This is particularly true in harsh pile-up conditions, with dozens of concurrent pp interactions, where every jet, including those originating primarily from the identified hard-scatter interaction, also has contributions from multiple pile-up interactions.

In order to identify and reject forward pile-up jets, a twofold strategy is adopted. Stochastic jets have intrinsic differences in shape with respect to hard-scatter and QCD pile-up jets, and this shape can be used for discrimination. On the other hand, the calorimeter signature of QCD pile-up jets does not differ fundamentally from that of hard-scatter jets. Therefore, QCD pile-up jets are identified by exploiting transverse momentum conservation in individual pile-up interactions.

The nature of pile-up jets can vary significantly whether or not most of the jet energy originates from a single interaction. Figure 3 shows the fraction of QCD pile-up jets among all pile-up jets, when considering inclusive truth-particle jets. The corresponding distributions for reconstructed jets are shown in Fig. 4. When considering only in-time pile-up contributions (Fig. 3), the fraction of QCD pile-up jets depends on the pseudorapidity and \(p_{\text {T}}\) of the jet and the average number of interactions per bunch crossing \(\langle \mu \rangle \). Stochastic jets are more likely at low \(p_{\text {T}}\) and \(|\eta |\) and in harsher pile-up conditions. However, the comparison between Fig. 3, containing inclusive truth-particle jets, and Fig. 4, containing reconstructed jets, suggests that only a small fraction of stochastic jets are due to in-time pile-up. Indeed, the fraction of QCD pile-up jets decreases significantly once out-of-time pile-up effects and detector noise and resolution are taken into account. Even though the average amount of out-of-time energy is higher in the forward region, topo-clustering results in a stronger suppression of this contribution in the forward region. Therefore, the fraction of QCD pile-up jets increases in the forward region, and it constitutes more than 80% of pile-up jets with \(p_{\text {T}}\) > 30 \(\text {GeV}\)overall. Similarly, the minimum at around \(|\eta |=1\) corresponds to a maximum in the pile-up noise distribution [24], which results in a larger number of stochastic pile-up jets relative to QCD pile-up jets. The fraction of stochastic jets becomes more prominent at low \(p_{\text {T}}\) and it grows as the number of interactions increases. The majority of pile-up jets in the forward region are QCD pile-up jets, although a sizeable fraction of stochastic jets is present in both the central and forward regions.

Fig. 3
figure 3

Fraction of pile-up tagged inclusive truth-particle jets classified as QCD pile-up jets as a function of a \(|\eta |\), b \(p_{\text {T}}\), and c \(\langle \mu \rangle \) for jets with \(20\,\text {GeV}<p_{\text {T}} <30\,\text {GeV}\) and d \(30\,\text {GeV}<p_{\text {T}} <40\,\text {GeV}\), as estimated in dijet events with Pythia8.186 pile-up simulation. The inclusive truth-particle jets are reconstructed from truth particles originating from all in-time pile-up interactions

Fig. 4
figure 4

Fraction of reconstructed pile-up jets classified as QCD pile-up jets, as a function of a \(|\eta |\), b \(p_{\text {T}}\), and c \(\langle \mu \rangle \) for jets with \(20\,\text {GeV}<p_{\text {T}} <30\,\text {GeV}\) and d \(30\,\text {GeV}<p_{\text {T}} <40\,\text {GeV}\), as estimated in dijet events with Pythia8.186 pile-up simulation

In the following, each source of forward pile-up jets is addressed with algorithms targeting its specific features.

4 Stochastic pile-up jet tagging with time and shape information

Given the evidence presented in Sect. 3 that out-of-time pile-up plays an important role for stochastic jets, a direct handle consists of the timing information associated with the jet. The jet timing \(t_\mathrm {jet}\) is defined as the energy-weighted average of the timing of the constituent clusters. In turn, the cluster timing is defined as the energy-weighted average of the timing of the constituent calorimeter cells. The jet timing distribution, shown in Fig. 5, is symmetric and centred at \(t_\mathrm {jet}=0\) for both the hard-scatter and pile-up jets. However, the significantly wider distribution for stochastic jets reveals the large out-of-time pile-up contribution. For jets with \(20<p_{\text {T}} {}<30\)  \(\text {GeV}\), requiring \(|t_\mathrm {jet}|<12\) ns ensures that 20% of stochastic pile-up jets are rejected while keeping 99% of hard-scatter jets. In the following, this is always applied as a baseline requirement when identifying stochastic pile-up jets.

Fig. 5
figure 5

Distribution of the jet timing \(t_\mathrm {jet}\) for hard-scatter, QCD pile-up and stochastic pile-up jets in the a central and b forward region

Stochastic jets can be further suppressed using shape information. Being formed from a random collection of particles from different interactions, stochastic jets lack the characteristic dense energy core of jets originating from the showering and hadronization of a hard-scatter parton. The energy is instead spread rather uniformly within the jet cone. Therefore, pile-up mitigation techniques based on jet shapes have been shown to be effective in suppressing stochastic pile-up jets [2]. In this section, the challenges of this approach are presented, and different algorithms exploiting the jet shape information are described and characterized.

The jet width w is a variable that characterizes the energy spread within a jet. It is defined as

$$\begin{aligned} w = \frac{\sum _k{\Delta R (\mathrm {jet},k)p_{\mathrm {T}}^k}}{\sum _k{p_{\mathrm {T}}^k}}, \end{aligned}$$
(3)

where the index k runs over the jet constituents and \(\Delta R (\mathrm {jet},k)\) is the angular distance between the jet constituent k and the jet axis. The jet width is a useful observable for identifying stochastic jets, as the average width is significantly larger for jets with a smaller fraction of energy originating from a single interaction.

In simulation the jet width can be computed using truth-particles (truth-particle width), as a reference point to benchmark the performance of the reconstructed observable. At detector level, the jet constituents are calorimeter topo-clusters. In general, topo-clustering compresses the calorimeter information while retaining its fine granularity. Ideally, each cluster captures the energy shower from a single incoming particle. However, the cluster multiplicity in jets decreases quickly in the forward region, to the point where jets are formed by a single cluster and the jet width can no longer be defined. An alternative approach consists of using as constituents the 11 by 11 grid of calorimeter towers in \(\eta \times \phi \), centred around the jet axis. The use of calorimeter towers ensures a fixed multiplicity given by the \(0.1\times 0.1\) granularity so that the jet width always contains jet shape information.

Fig. 6
figure 6

Dependence of the average jet width on the number of reconstructed primary vertices (\(N_\mathrm {PV}\)). The distributions are shown using a hard-scatter and in-time pile-up truth-particles, b clusters, or c towers as constituents

As shown in Fig. 6, the average jet width depends on the pile-up conditions. At higher pile-up values, a larger number of pile-up particles are likely to contribute to a jet, thus broadening the energy distribution within the jet itself. As a result, the width drifts towards higher values for hard-scatter, QCD pile-up, and stochastic jets. The difference in width between hard-scatter and QCD pile-up jets is due to the different underlying \(p_{\text {T}}\) spectra. The spectrum of QCD pile-up jets is softer than that of the hard-scatter jets for the process considered (\({t\bar{t}}\)); therefore, a significant fraction of QCD pile-up jets are reconstructed with \(p_{\text {T}}\) between 20 and 30 \(\text {GeV}\)because the stochastic and out-of-time component is larger than in hard-scatter jets.

Fig. 7
figure 7

Distribution of the average tower \(p_{\text {T}}\) for hard-scatter jets as a function of the angular distance from the jet axis in \(\eta \) and \(\phi \) in simulated \({t\bar{t}}\) events

Using calorimeter towers as constituents, it is possible to explore the \(p_{\text {T}}\) distribution within a jet with a fixed \(\eta \times \phi \) granularity. Figure 7 shows the two-dimensional \(p_{\text {T}}\) distribution around the jet axis for hard-scatter jets. The distribution is symmetric in \(\phi \), while the pile-up pedestal decreases with increasing \(\eta \), as is expected in the forward region. A new variable, designed to exploit the full information about tower constituents, is considered. The two-dimensionalFootnote 7 \(p_{\text {T}}\) distribution in the \(\Delta \eta \)\(\Delta \phi \) plane centred around the jet axis is fitted with a function

$$\begin{aligned} f = \alpha +\beta \Delta \eta +\gamma \mathrm {e}^{-\frac{1}{2}\left( \frac{\Delta \eta }{0.1}\right) ^2-\frac{1}{2}\left( \frac{\Delta \phi }{0.1}\right) ^2}. \end{aligned}$$
(4)

Both the width of the Gaussian component of the fit and the range in which the fit is performed are treated as jet-independent constants. The fit range, an \(11\times 11\) tower grid, optimizes the balance between an improved constant (\(\alpha \)) and linear (\(\beta \)) term measurement by using a larger range and a decreased risk of including outside pile-up fluctuations by using a smaller range. On average, the jet tower \(p_{\text {T}}\) distribution is symmetric with respect to \(\Delta \phi \), and pile-up rejection at constant hard-scatter efficiency is improved by averaging the tower momenta at \(|\Delta \phi | \) and \(-|\Delta \phi | \) so that fluctuations are partially cancelled before performing the fit.

The constant (\(\alpha \)) and linear (\(\beta \)) terms in the fit capture the average stochastic pile-up contribution to the jet \(p_{\text {T}}\) distribution, while the Gaussian term describes the \(p_{\text {T}}\) distribution from the underlying hard-scatter or QCD pile-up jet. The parameter \(\gamma \) therefore represents a stochastic pile-up-subtracted estimate of the \(p_{\text {T}}\) of such a hard-scatter or QCD pile-up jet in a \(\Delta R=0.1\) core assuming a Gaussian \(p_{\text {T}}\) distribution of its constituent towers. By definition, \(\gamma \) does not depend on the amount of pile-up in the event, but only on the stochastic nature of the jet.. In order to make the fitting procedure more robust, the Gaussian width parameter is fixed. While the width of a hard-scatter or QCD pile-up jet is expected to depend on the truth-particle jet \(p_{\text {T}}\) and \(\eta \), such dependence is negligible in the \(p_{\text {T}}\) range relevant for these studies (20–50 \(\text {GeV}\)). Figure 8, showing projections of the tower distribution with the fit function overlaid, illustrates the characteristic peaking shape of pure hard-scatter jets compared with the flatter distribution in stochastic jets. The hard-scatter jet distribution displays the expected, sharply peaked distribution, while the stochastic pile-up jet distribution is flat with various off-centre features, reflecting the randomness of the underlying processes.

Fig. 8
figure 8

Symmetrized tower \(p_{\text {T}}\) distribution projections in \(\phi \) for an example a hard-scatter jet and b stochastic pile-up jet in simulated \({t\bar{t}}\) events. The black histogram line corresponds to the projection of the 2D tower distribution. The fit model closely follows the hard-scatter jet distribution, yielding a large Gaussian signal, while stochastic pile-up jets feature multiple smaller signals, away from the jet core

The performance of the \(\gamma \) variable and of the cluster-based and tower-based widths is compared in Fig. 9, where the efficiency for stochastic pile-up jets is shown as a function of the hard-scatter jet efficiency. Each curve is obtained by applying an upper or lower bound on the jet width or \(\gamma \), respectively, in order to select hard-scatter jets. The tower-based width outperforms the cluster-based width over the whole efficiency range, while the \(\gamma \) variable performs similarly to the tower-based width. The hard-scatter efficiency and pile-up efficiency dependence on the number of reconstructed vertices in the event (\(N_\mathrm {PV}\)) and \(\eta \) is shown in Fig. 10; the requirement for each discriminant is tuned so that an overall efficiency of 90% is achieved for hard-scatter jets. By construction, the performance of the \(\gamma \) variable is less affected by the pile-up conditions than the two width variables.

Fig. 9
figure 9

Efficiency for stochastic pile-up jets as a function of the efficiency for hard-scatter jets using different shape-based discriminants: a \(10\le \langle \mu \rangle <20\) and b \(30\le \langle \mu \rangle <40\) in simulated \({t\bar{t}}\) events

Fig. 10
figure 10

Hard-scatter jet efficiency as a function of a number of reconstructed primary vertices \(N_\mathrm {PV}\) and b pseudorapidity \(|\eta |\), as well as stochastic pile-up jet efficiency as a function of c number of reconstructed primary vertices \(N_\mathrm {PV}\) and d pseudorapidity \(|\eta |\) at 90% efficiency of selecting hard-scatter jets in simulated \({t\bar{t}}\) events

The \(\gamma \) parameter is a good discriminant for stochastic pile-up jets because it provides an estimate of the largest amount of \(p_{\text {T}}\) in the jet originating from a single vertex. If there is no dominant contribution, the \(p_{\text {T}}\) distribution does not feature a prominent core, and therefore \(\gamma \) is close to zero. With this approach, all jets are effectively considered as QCD pile-up jets, and \(\gamma \) is used to estimate their core \(p_{\text {T}}\). Therefore, from this stage, the challenge of pile-up rejection is reduced to the identification and rejection of QCD pile-up jets, which is discussed in the following section.

5 QCD pile-up jet tagging with topological information

While it has been shown that pile-up mitigation techniques based on jet shapes are effective in suppressing stochastic pile-up jets, such methods do not address QCD pile-up jets that are prevalent in the forward region. This section describes the development of an effective rejection method specifically targeting QCD pile-up jets.

QCD pile-up jets originate from a single \(pp\) interaction where multiple jets can be produced. The total transverse momentum associated with each pile-up interaction is expected to be conserved;Footnote 8 therefore all jets and central tracks associated with a given vertex can be exploited to identify QCD pile-up jets beyond the tracking coverage of the inner detector. The principle is clear if the dijet final state alone is considered. Forward pile-up jets are therefore identified by looking for a pile-up jet opposite in \(\phi \) in the central region. The main limitation of this approach is that it only addresses dijet pile-up interactions in which both jets are reconstructed.

In order to address this challenge, a more comprehensive approach is adopted by considering the total transverse momentum of tracks and jets associated with each reconstructed vertex independently. The more general assumption is that the transverse momentum of each pile-up interaction should be balanced, and any imbalance would be due to a forward jet from one of the interactions.

In order to properly compute the transverse momentum of each interaction, only QCD pile-up jets should be considered. Consequently, the challenge of identifying forward QCD pile-up jets using transverse momentum conservation with central pile-up jets requires being able to discriminate between QCD and stochastic pile-up jets in the central region.

5.1 A discriminant for central pile-up jet classification

Discrimination between stochastic and QCD pile-up jets in the central region can be achieved using track and vertex information. This section describes a new discriminant built for this purpose.

The underlying features of QCD and stochastic pile-up jets are different. Tracks matched to QCD pile-up jets mostly originate from a vertex PV\(_i\) corresponding to a pile-up interaction (\(i\ne 0\)), thus yielding \(R_\mathrm {pT} ^i>R_\mathrm {pT} ^0\) for a given jet. Such jets have large values of \(R_\mathrm {pT} ^i\) with respect to the pile-up vertex i from which they originated. Tracks matched to stochastic pile-up jets are not likely to originate from the same interaction, thus yielding small \(R_\mathrm {pT} ^i\) values with respect to any vertex i. This feature can be exploited to discriminate between these two categories. For stochastic pile-up jets, the largest \(R_\mathrm {pT} ^i\) value is going to be of similar size as the average \(R_\mathrm {pT} ^i\) value across all vertices, while a large difference will show for QCD jets, as most tracks originate from the same pile-up vertex.

Thus, the difference between the leading and median values of \(R_\mathrm {pT} ^i\) for a central jet, \(\Delta R_\mathrm {pT} \), can be used for distinguishing QCD pile-up jets from stochastic pile-up jets in the central region, as shown in Fig. 11. A minimum \(\Delta R_\mathrm {pT} \) requirement can effectively reject stochastic pile-up jets. In the following a \(\Delta R_\mathrm {pT} >0.2\) requirement is applied for central jets with \(p_{\text {T}} < 35 \,\text {GeV}\). Above this threshold the fraction of stochastic pile-up jets is negligible, and all pile-up jets are therefore assumed to be QCD pile-up jets irrespective of their \(\Delta R_\mathrm {pT} \) value. The choice of threshold depends on the pile-up conditions. This choice is tuned to be optimal for the collisions considered in this study, with an average of 13.5 interactions per bunch crossing.

Fig. 11
figure 11

Distribution of \(\Delta R_\mathrm {pT} \) for stochastic and QCD pile-up jets, as observed in dijet events with Pythia8.186 pile-up simulation

The total transverse momentum of each vertex is thus computed by averaging, with a vectorial sum, the total transverse momentum of tracks and central jets assigned to the vertex. The jet–vertex matching is performed by considering the largest \(R_\mathrm {pT} ^i\) for each jet. The transverse momentum vector (\(\varvec{p}_\mathrm {T}\)) of a given forward jet is then compared with the total transverse momentum of each vertex in the event. If there is at least one pile-up vertex in the event with a large total vertex transverse momentum back-to-back in \(\phi \) with respect to the forward jet, the jet itself is likely to have originated from that vertex. Figure 12 shows an example event, where the \(\varvec{p}_\mathrm {T}\) of a forward pile-up jet is back-to-back with respect to the total transverse momentum of the vertex from which it is expected to have originated.

Fig. 12
figure 12

Display of candidate \(Z (\rightarrow \mu \mu )\) event (muons in yellow) containing two QCD pile-up jets. Tracks from the primary vertex are in red, those from the pile-up vertex with the highest \(\sum p_\mathrm {T}^2\) are in green. The top panel shows a transverse and longitudinal view of the detector, while the bottom panel shows the details of the event in the ID in the longitudinal view

5.2 Forward jet vertex tagging algorithm

The procedure is referred to as forward jet vertex tagging (f\(\mathrm {JVT}\)). The main parameters for the forward JVT algorithm are thus the maximum JVT value, \(\mathrm {JVT} _\mathrm {max}\), to reject central hard-scatter jets and the minimum \(\Delta R_\mathrm {pT} \) requirement to ensure the selected pile-up jets are QCD pile-up jets. \(\mathrm {JVT} _\mathrm {max}\) is set to 0.14 corresponding to an efficiency of selecting pile-up jets of 93% in dijet events. The minimum \(\Delta R_\mathrm {pT} \) requirement defines the operating point in terms of efficiency for selecting QCD pile-up jet and contamination from stochastic pile-up jets. A minimum \(\Delta R_\mathrm {pT} \) of 0.2 is required, corresponding to an efficiency of \(70\%\) for QCD pile-up jets and \(20\%\) for stochastic pile-up jets in dijet events. The selected jets are then assigned to the vertex PV\(_i\) corresponding to the highest \(R_\mathrm {pT} ^i\) value. For each pile-up vertex i, \(i\ne 0\), the missing transverse momentum \(\langle \varvec{p}_{\mathrm {T},i}^\mathrm {miss}\rangle \) is computed as the weighted vector sum of the jet (\(\varvec{p}_\mathrm {T}^\mathrm {jet}\)) and track (\(\varvec{p}_\mathrm {T}^\mathrm {track}\)) transverse momenta:

$$\begin{aligned} \langle \varvec{p}_{\mathrm {T},i}^\mathrm {miss}\rangle =-\frac{1}{2}\left( \sum _{\mathrm {tracks \in PV}_i}k\varvec{p}_\mathrm {T}^\mathrm {track} + \sum _{\mathrm {jets \in PV}_i}\varvec{p}_\mathrm {T}^\mathrm {jet}\right) . \end{aligned}$$
(5)

The factor k accounts for intrinsic differences between the jet and track terms. The track component does not include the contribution of neutral particles, while the jet component is not sensitive to soft emissions significantly below 20 \(\text {GeV}\). The value \(k=2.5\) is chosen as the one that optimizes the overall rejection of forward pile-up jets.

The \(\mathrm {fJVT}\) discriminant for a given forward jet, with respect to the vertex i, is then defined as the normalized projection of the missing transverse momentum on \(\varvec{p}_T^\mathrm {fj}\):

$$\begin{aligned} \mathrm {fJVT}_i = \frac{\langle \varvec{p}_{\mathrm {T},i}^\mathrm {miss}\rangle \cdot \varvec{p}_\mathrm {T}^\mathrm {fj}}{|\varvec{p}_\mathrm {T}^\mathrm {fj}|^2}, \end{aligned}$$
(6)

where \(\varvec{p}_\mathrm {T}^\mathrm {fj}\) is the forward jet’s transverse momentum. The motivation for this definition is that the amount of missing transverse momentum in the direction of the forward jet needed for the jet to be tagged should be proportional to the jet’s transverse momentum. The forward jet is therefore tagged as pile-up if its \(\mathrm {fJVT}\) value, defined as \(\mathrm {fJVT} =\mathrm {max}_i(\mathrm {fJVT} _i)\), is above a threshold. The choice of threshold determines the pile-up rejection performance. The \(\mathrm {fJVT}\) discriminant tends to have larger values for QCD pile-up jets, while the distribution for hard-scatter jets falls steeply, as shown in Fig. 13.

Fig. 13
figure 13

The \(\mathrm {fJVT}\) distribution for hard-scatter (blue) and pile-up (green) forward jets in simulated \(Z +\)jets events with at least one forward jet with a \(30<p_{\text {T}} <40\) \(\text {GeV}\)or b \(40<p_{\text {T}} <50\) \(\text {GeV}\)

5.3 Performance

Figure 14 shows the efficiency of selecting forward pile-up jets as a function of the efficiency of selecting forward hard-scatter jets when varying the maximum \(\mathrm {fJVT}\) requirement.

Fig. 14
figure 14

Efficiency for pile-up jets in simulated \(Z +\)jets events as a function of the efficiency for hard-scatter jets for different jet \(p_{\text {T}}\) ranges.eps

Using a maximum \(\mathrm {fJVT}\) of 0.5 and 0.4 respectively, hard-scatter efficiencies of 92 and 85% are achieved for pile-up efficiencies of 60 and 50%, considering jets with \(20<p_{\text {T}} <50\,\text {GeV}\). The dependence of the hard-scatter and pile-up efficiencies on the forward jet \(p_{\text {T}}\) is shown in Fig. 15. For low-\(p_{\text {T}}\) forward jets, the probability of an upward fluctuation in the \(\mathrm {fJVT}\) value is more likely, and therefore the efficiency for hard-scatter jets is slightly lower than for higher-\(p_{\text {T}}\) jets. The hard-scatter efficiency depends on the number of pile-up interactions, as shown in Fig. 16, as busier pile-up conditions increase the chance of accidentally matching the hard-scatter jet to a pile-up vertex. The pile-up efficiency depends on the \(p_{\text {T}}\) of the forward jets, due to the \(p_{\text {T}}\)-dependence of the relative numbers of QCD and stochastic pile-up jets.

Fig. 15
figure 15

Efficiency for a hard-scatter jets and b pile-up jets as a function of the forward jet \(p_{\text {T}}\) in simulated \(Z +\)jets events

Fig. 16
figure 16

Efficiency in simulated \(Z +\)jets events as a function of \(N_\mathrm {PV}\) for hard-scatter forward jets with a \(30\,\text {GeV}<p_{\text {T}} <40\,\text {GeV}\) and b \(40\,\text {GeV}<p_{\text {T}} <50\,\text {GeV}\), and for pile-up forward jets with c \(30\,\text {GeV}<p_{\text {T}} <40\,\text {GeV}\) d \(40\,\text {GeV}<p_{\text {T}} <50\,\text {GeV}\)

5.4 Efficiency measurements

The \(\mathrm {fJVT}\) efficiency for hard-scatter jets is measured in \(Z +\mathrm {jets}\) data events, exploiting a tag-and-probe procedure similar to that described in Ref. [1].

For \(Z(\rightarrow \mu \mu )+\)jets events, selected by single-muon triggers, two muons of opposite sign and \(p_{\text {T}} >25\,\text {GeV}\) are required, such that their invariant mass lies between 66 and 116 \(\text {GeV}\). Events are further required to satisfy event and jet quality criteria, and a veto on cosmic-ray muons.

Using the leading forward jet recoiling against the \(Z\) boson as a probe, a signal region of forward hard-scatter jets is defined as the back-to-back region specified by \(|\Delta \phi (Z, \mathrm {jet})| > 2.8\) rad. In order to select a sample pure in forward hard-scatter jets, events are required to have no central hard-scatter jets with \(p_{\text {T}} >20 \,\text {GeV}\), identified with \(\mathrm {JVT}\), and exactly one forward jet. The \(Z\) boson is required to have \(p_{\text {T}} > 20 \,\text {GeV}\), as events in which the \(Z\) boson has \(p_{\text {T}}\) less than the minimum defined jet \(p_{\text {T}}\) have a lower hard-scatter purity. The above selection results in a forward hard-scatter signal region that is greater than 98% pure in hard-scatter jets relative to pile-up jets, as estimated in simulation.

The \(\mathrm {fJVT}\) distributions for data and simulation in the signal region are compared in Fig. 17. The data distribution is observed to have fewer jets with high \(\mathrm {fJVT}\) than predicted by simulation, consistent with an overestimation of the number of pile-up jets, as reported in Ref. [1].

Fig. 17
figure 17

Distributions of \(\mathrm {fJVT}\) for jets with \(p_{\text {T}}\) a between 20 and 30  \(\text {GeV}\)and b between 30 and 50 \(\text {GeV}\)for data (black circles) and simulation (red squares). The lower panels display the ratio of the data to the simulation. The grey bands account for statistical and systematic uncertainties

The pile-up jet contamination in the signal region \(N_{\mathrm {PU}}^\mathrm {signal}(|\Delta \phi (Z,\mathrm {jet}) |>2.8~\mathrm {rad})\) is estimated in a pile-up-enriched control region with \(|\Delta \phi (Z,\mathrm {jet}) |<1.2\) rad, based on the assumption that the \(|\Delta \phi (Z,\mathrm {jet}) |\) distribution is uniform for pile-up jets. The validity of such assumption was verified in simulation. The pile-up jet rate in data is therefore used to estimate the contamination of the signal region as

$$\begin{aligned} N_{\mathrm {PU}}^\mathrm {signal}(|\Delta \phi (Z,\mathrm {jet}) |>2.8~\mathrm {rad}) = \\ [N_\mathrm {j}^\mathrm {control}(|\Delta \phi (Z,\mathrm {jet}) |<1.2~\mathrm {rad}) - N_\mathrm {HS}(|\Delta \phi (Z,\mathrm {jet}) |<1.2~\mathrm {rad})] \cdot (\pi - 2.8~\mathrm {rad})/1.2~\mathrm {rad}, \end{aligned}$$
(7)

where \({N_\mathrm {j}^\mathrm {control}(|\Delta \phi (Z,\mathrm {jet}) |<1.2~\mathrm {rad})}\) is the number of jets in the data control region and \({N_\mathrm {HS}(|\Delta \phi (Z,\mathrm {jet}) |<1.2~\mathrm {rad})}\) is the expected number of hard-scatter jets in the control region, as predicted in simulation.

The hard-scatter efficiency is therefore measured in the signal region as

$$\begin{aligned} \varepsilon = \frac{N_\mathrm {j}^\mathrm {pass} - N_\mathrm {PU}^\mathrm {pass}}{N_\mathrm {j}^\mathrm {signal} - N_{\mathrm {PU}}^\mathrm {signal}}, \end{aligned}$$
(8)

where \(N_\mathrm {j}^\mathrm {signal}\) and \(N_\mathrm {j}^\mathrm {pass}\) denote respectively the overall number of jets in the signal region and the number of jets in the signal region satisfying the \(\mathrm {fJVT}\) requirements. The terms \(N_\mathrm {PU}^\mathrm {pass}\) and \(N_\mathrm {PU}^\mathrm {signal}\) represent the overall number of pile-up jets in the signal region and the number of pile-up jets satisfying the \(\mathrm {fJVT}\) requirements, respectively, and are both estimated from simulation. Figure 18 shows the hard-scatter efficiency evaluated in data and simulation. The uncertainties correspond to a 30% uncertainty in the number of pile-up jets and a 10% uncertainty in the number of hard-scatter jets in the signal region. The uncertainties are estimated by comparing data and simulation in the pile-up- and hard-scatter-enriched regions, respectively. The hard-scatter efficiency is found to be underestimated in simulation, consistent with the simulation overestimating the pile-up activity in data. The level of disagreement is observed to be larger at low jet \(p_{\text {T}}\) and high \(|\eta |\) and can be as large as about 3%. The efficiencies evaluated in this paper are used to define a calibration procedure accounting for this discrepancy. The uncertainties associated with the calibration and resolution of the jets used to compute \(\mathrm {fJVT}\) are estimated in ATLAS analyses by recomputing \(\mathrm {fJVT}\) for each variation reflecting a systematic uncertainty.

Fig. 18
figure 18

Efficiency for hard-scatter jets to pass \(\mathrm {fJVT}\) requirements as a function of (a, b) \(p_{\text {T}}\) and (c, d) \(|\eta |\) for the (a, c) 92% (\(\mathrm {fJVT} <0.5\)) and (b, d) 85% (\(\mathrm {fJVT} <0.4\)) hard-scatter efficiency operating points of the \(\mathrm {fJVT}\) discriminant in data (black circles) and simulation (red squares). The lower panels display the ratio of the data to the simulation. The grey bands account for statistical and systematic uncertainties

6 Pile-up jet tagging with shape and topological information

The \(\mathrm {fJVT}\) and \(\gamma \) discriminants correspond to a twofold strategy for pile-up rejection targeting QCD and stochastic pile-up jets, respectively. However, as highlighted in Sect. 3, this classification is not well defined as all jets have a stochastic component. Therefore, it is useful to define a coherent strategy that addresses both the stochastic and QCD nature of pile-up jets at the same time.

The \(\gamma \) parameter discussed in Sect. 4 provides an estimate of the \(p_{\text {T}}\) in the core of the jet originating from the single interaction contributing the largest amount of transverse momentum to the jet. Therefore, the \(\mathrm {fJVT}\) definition can be modified to exploit this estimation by replacing the jet \(p_{\text {T}}\) with \(\gamma \), so that

$$\begin{aligned} \mathrm {fJVT} _{\gamma } = \frac{\langle \varvec{p}_{\mathrm {T},i}^\mathrm {miss}\rangle \cdot \varvec{u}^\mathrm {fj}}{\gamma }, \end{aligned}$$
(9)

where \(\varvec{u}^\mathrm {fj}\) is the unit vector representing the direction of the forward jet in the transverse plane.

Figure 19 shows the performance of \(\mathrm {fJVT} _{\gamma }\) compared with \(\mathrm {fJVT}\) and \(\gamma \) independently. The \(\mathrm {fJVT} _{\gamma }\) discriminant outperforms the individual discriminants over the whole efficiency range. In samples enriched in QCD pile-up jets (\(30<p_{\text {T}} < 50\) \(\text {GeV}\)), the \(\mathrm {fJVT} _{\gamma }\) performance is driven by the topology information, while \(\mathrm {fJVT} _{\gamma }\) benefits from the shape information for rejecting stochastic pile-up jets. A multivariate combination of \(\mathrm {fJVT}\) and \(\gamma \) discriminants was also studied and found to be similar in performance to \(\mathrm {fJVT} _{\gamma }\).

Fig. 19
figure 19

Efficiency for selecting pile-up jets as a function of the efficiency for selecting hard-scatter jets in simulated \({t\bar{t}}\) events for a jets with \(20 \,\text {GeV}<p_{\text {T}} <30 \,\text {GeV}\) and b jets with \(30\,\text {GeV}< p_{\text {T}} <50 \,\text {GeV}\)

7 Impact on physics of Vector–Boson Fusion

In order to quantify the impact of forward pile-up rejection on a VBF analysis, the VBF \(H\rightarrow \tau \tau \) signature is considered, in the case where the \(\tau \) decays leptonically. The pile-up dependence of the signal purity (S/B) is studied in a simplified analysis in the dilepton channel. Several other channels are used in the analysis of VBF \(H\rightarrow \tau \tau \) by ATLAS; the dilepton channel is chosen for this study by virtue of its simple selection and background composition. The dominant background in this channel originates from \(Z +\)jets production, where the \(Z \) boson decays leptonically, either to electrons, muons, or a leptonically decaying \(\tau \tau \) pair. The rate of \(Z \) bosons produced in association with two jets satisfying the requirements targeting the VBF topology is extremely low. The requirements include large \(\Delta \eta \) between the jets and large dijet invariant mass \(m_\mathrm {jj}\). However, background events with forward pile-up jets often have large \(\Delta \eta \) and \(m_\mathrm {jj}\), mimicking the VBF topology. As a consequence, the background acceptance grows almost quadratically with the number of pile-up interactions. This section illustrates the mitigation of this effect that can be achieved with the pile-up rejection provided by \(\mathrm {fJVT} _{\gamma }\).

The event selection used for this study was optimized using simulation without pile-up  [26]:

  • The event must contain exactly two opposite-charge same-flavour leptons \(\ell ^+\ell ^-\) (with \(\ell =e\),\(\mu \)) with \(p_{\text {T}}\) >15 \(\text {GeV}\);

  • The invariant mass of the lepton pair must satisfy \(m_{\ell ^+\ell ^-}<66\,\text {GeV}\) or \(m_{\ell ^+\ell ^-}>116\,\text {GeV}\);

  • The magnitude of the missing transverse momentum must be larger than \(40\,\text {GeV}\);

  • The event must contain two jets with \(p_{\text {T}} >20\,\text {GeV}\), one of which has \(p_{\text {T}} >40\,\text {GeV}\). The absolute difference in rapidities \(|\eta _{\mathrm {j}_1}-\eta _{\mathrm {j}_2}|\) must exceed 4.4 and the invariant mass of the two jets must exceed 700 \(\text {GeV}\).

  • For simulated VBF \(H\rightarrow \tau \tau \) only, both jets are required to be truth-labelled as hard-scatter jets.

The impact of pile-up mitigation is emulated by randomly removing hard-scatter and pile-up jets to match the performance of a \(\mathrm {fJVT} _{\gamma }\) requirement with 85% overall efficiency for hard-scatter jets with \(20< p_{\text {T}} < 50~\text {GeV}\), as estimated in \({t\bar{t}}\) simulation with an average \(\langle \mu \rangle \) of 13.5. The efficiencies are estimated as a function of the jet \(p_{\text {T}}\) and the average number of interactions per bunch crossing.

Fig. 20
figure 20

Relative expected yield variation of a \(Z \rightarrow \ell \ell \) and b VBF \(H\rightarrow \tau \tau \) events and c signal purity as a function of the number interactions per bunch crossing (\(\langle \mu \rangle \)), with different levels of pile-up rejection using \(\mathrm {fJVT} _\gamma \). The expected signal and background yields at \(\langle \mu \rangle =10\) are used as reference. Parameterized hard-scatter efficiency and pile-up efficiency are used. The lower panels display the ratio to the reference without pile-up rejection

Figure 20 shows the expected numbers of signal and background events, as well as the signal purity, as a function of \(\langle \mu \rangle \). When going from \(\langle \mu \rangle \) of 10 to 35, the expected number of background events grows by a factor of seven and the corresponding signal purity drops by a factor of eight, indicating that the presence of pile-up jets enhances the background acceptance. The slight decrease in signal acceptance is due to misidentification of pile-up jets as VBF jets. The \(\mathrm {fJVT} _{\gamma }\) algorithm mitigates the background growth, at the expense of a signal loss proportional to the hard-scatter jet efficiency.Footnote 9 Therefore, the degradation of the purity due to pile-up can be effectively reduced. For the specific final state and event selection under consideration, where \(Z +\)jets production is the dominant background, this results in about a fourfold improvement in signal purity at \(\langle \mu \rangle =35\).

8 Conclusions

The presence of multiple pp interactions per bunch crossing at the LHC, referred to as pile-up, results in the reconstruction of additional jets beside the ones from the hard-scatter interaction. The ATLAS baseline strategy for identifying and rejecting pile-up jets relies on matching tracks to jets to determine the pp interaction of origin. This strategy cannot be applied for jets beyond the tracking coverage of the inner detector. However, a broad spectrum of physics measurements at the LHC relies on the reconstruction of jets at high pseudorapidities. An example is the measurement of Higgs boson production through vector-boson fusion. The presence of pile-up jets at high pseudorapidities reduces the sensitivity for these signatures, by incorrectly reconstructing these final states in background events.

The techniques presented in this paper allow the identification and rejection of pile-up jets beyond the tracking coverage of the inner detector. The strategy to perform such a task is twofold. First, the information about the jet shape is used to estimate the leading contribution to the jet above the stochastic pile-up noise. Then the topological correlation among particles originating from a pile-up interaction is exploited to extrapolate the jet vertex tagger, using track and vertex information, beyond the tracking coverage of the inner detector to identify and reject pile-up jets at high pseudorapidities. When using both shape and topological information, approximately 57% of forward pile-up jets are rejected for a hard-scatter efficiency of about 85% at the pile-up conditions considered in this paper, with an average of 22 pile-up interactions. In events with 35 pile-up interactions, typical conditions for the LHC operations in the near future, 37, 48, and 51% of forward pile-up jets are rejected using, respectively, topological information, shape information, and their combination, for the same 85% hard-scatter efficiency.

A procedure is defined and used to measure the efficiency of identifying hard-scatter jets in 3.2 fb\(^{-1}\)of pp collisions at \(\sqrt{s}=13\,\text {TeV} \) collected in 2015. The efficiencies are measured in data and estimated in simulation as a function of the jet kinematics. Discrepancies of up to approximately 3% are observed, mainly due to the modelling of pile-up events.

The impact of forward pile-up rejection algorithms presented here is estimated in a simplified study of Higgs boson production through vector-boson fusion and decaying into a \(\tau \tau \) pair; the signal purity for the baseline selection under consideration, where \(Z +\)jets production is the dominant background, is enhanced by a factor of about four for events with 35 pile-up interactions.