1 Introduction

The phase difference \(\phi _{{s}}\) between direct decays and decays through mixing of \({B} ^0_{s} \) mesons to Charge-Parity (\(C\!P\)) eigenstates is a \(C\!P\)-violating observable. In the Standard Model (SM), considering \({b} \!\rightarrow ({c} {\overline{{c}}}){s} \) transitions and neglecting subleading penguin contributions, this phase is predicted to be \(-2{\beta _{{s}}} \), where \({\beta _{{s}}} =\arg [-({V_{{t} {s}}} {V_{{t} {b}}^*})/({V_{{c} {s}}} {V_{{c} {b}}^*})]\) and \(V_{ij}\) are the elements of the CKM quark-flavour mixing matrix [1, 2].

The precise measurement of the \(\phi _{{s}}\) phase is potentially sensitive to new physics (NP) processes. The measured phase could be modified if new particles were to contribute to the \({B} ^0_{s} \)\({ 0.18em\overline{ -0.18em B}} {}^0_{s} \) mixing amplitudes [3, 4]. Measurements of \(\phi _{{s}}\) using different decay channels with muons in the final state, namely \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} {{K} ^+} {{K} ^-} \)  [5, 6], \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} {{\pi } ^+} {{\pi } ^-} \)  [7], \({{B} ^0_{s}} \!\rightarrow {\psi {(2S)}} \phi \)  [8], and a channel with open charm mesons, \({{B} ^0_{s}} \!\rightarrow {{D} ^+_{s}} {{D} ^-_{s}} \)  [9], have been reported previously by the LHCb collaboration. Measurements of \(\phi _{{s}}\) in \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} \phi \) decays with \({{J /\psi }} \!\rightarrow {\mu ^+\mu ^-} \) have also been performed by the ATLAS  [10, 11], CMS [12], CDF [13] and D0 [14] collaborations. The world-average value of these measurements is \({\phi _{{s}}} =-0.051\pm 0.023\text {\,rad}\) [15]. A precise prediction of the \(\phi _{{s}}\) phase value is available from global fits of the CKM matrix within the SM. The CKMFitter group result is \({\phi _{{s}}} =-0.0365^{\,+\,0.0013}_{\,-\,0.0012}\text {\,rad}\) [16] while the UTfit collaboration result is \({\phi _{{s}}} =-0.0370\pm 0.0010\text {\,rad}\) [17].

Fig. 1
figure 1

Definition of the angles in the helicity basis. The polar angle \(\theta _K\) \((\theta _e)\) is the angle between the \({K} ^+\) (\(e ^+\)) momentum and the direction opposite to the \({B} ^0_{s} \) momentum in the \({K} ^+\) \({K} ^-\) (\(e ^+e ^-\)) centre-of-mass system, and the \(\phi _h\) is the azimuthal angle between the \({K} ^+\) \({K} ^-\) and \(e ^+e ^-\) decay planes

This paper presents a measurement of \(\phi _{{s}}\) using a flavour-tagged time-dependent angular analysis of the \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} \phi \) mode with \({{J /\psi }} \!\rightarrow {e ^+e ^-} \) and \(\phi \!\rightarrow {{K} ^+} {{K} ^-} \) decays.Footnote 1 This is the first time that the \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} (\) \(e ^+e ^-\))\(\phi \) decay is used to measure \(C\!P\)-violating observables, and in particular the phase \(\phi _{{s}}\). The analysis is based on a data set corresponding to an integrated luminosity of \(3 \text {\,fb} ^{-1} \) collected at the \(\text{ LHC } \) in proton-proton (pp) collisions at centre-of-mass energies of 7 and \(8\text {\,Te V} \) by the LHCb experiment. The yield of the \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} (\) \(e ^+e ^-\))\(\phi \)(\({K} ^+\) \({K} ^-\)) sample amounts to about \(10\%\) of that of the previously analysed \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} (\) \(\mu ^+\mu ^-\))\(\phi \)(\({K} ^+\) \({K} ^-\)) mode using the same data set [18]. The analysis follows closely that of the two muons decay mode, reported in Refs. [5, 7]. Relevant changes are described in more detail in this paper.

A comparison of the two results is of interest given the different main sources of systematic uncertainties induced by the markedly different reconstruction of decays with muons in the final state compared to decays with electrons. These differences arise from the significant bremsstrahlung emission of the electrons and the different signatures exploited in the online trigger selection [19,20,21].

The article is structured in the following way. The phenomenological description of the \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} (\) \(e ^+e ^-\))\(\phi \)(\({K} ^+\) \({K} ^-\)) decay and the relevant physics observables are described in Sect. 2. A brief description of the \(\text{ LHCb } \) detector, the candidates selection and the background subtraction are outlined in Sect. 3. The relevant inputs to the analysis, namely the resolution, efficiency and the flavour tagging, are detailed in Sects. 4 and 5. The maximum-likelihood fit procedure used to determine the physics parameters and the results of the fit are described in Sect. 6, while the evaluation of the systematic uncertainties is discussed in Sect. 7. Finally, conclusions are presented in Sect. 8.

2 Phenomenology

The phenomenological aspects of the analysis are presented in Ref. [22]. This formalism also holds for the \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} (\) \(e ^+e ^-\))\(\phi \)(\({K} ^+\) \({K} ^-\)) decay. Angular momentum conservation in the \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} \phi \) decay implies that the final state is an admixture of two \(C\!P\)-even and one \(C\!P\)-odd components, with orbital angular momentum of 0 or 2, and 1, respectively. Moreover, along with the three P-wave states of the \(\phi \!\rightarrow {{K} ^+} {{K} ^-} \) transition, there is also a \(C\!P\)-odd \({K} ^+\) \({K} ^-\) component in an S-wave state [23]. The \(C\!P\)-even and \(C\!P\)-odd components are disentangled by a time-dependent angular analysis, where the angular observables \(\Omega =\{\cos \theta _e,\cos \theta _K,\phi _h\}\) are defined in the helicity basis as shown in Fig. 1. The polar angle \(\theta _K\) \((\theta _e)\) is the angle between the \({K} ^+\) (\(e ^+\)) momentum and the direction opposite to the \({B} ^0_{s} \) momentum in the \({K} ^+\) \({K} ^-\) (\(e ^+e ^-\)) centre-of-mass system. The azimuthal angle between the \({K} ^+\) \({K} ^-\) and \(e ^+e ^-\) decay planes is \(\phi _h\). A definition of the angles in terms of the particles momenta can be found in Ref. [22].

The differential decay rate for \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} \phi \) decay as a function of the decay time and angles can be expressed as a sum of polarisation amplitudes and their interference terms. Each of these can be factorised into a part dependent on the decay time t and a part dependent on the set of angular variables \(\Omega \), as

$$\begin{aligned} G(t,\Omega )\equiv \frac{\mathrm {d}^{4}\Gamma ({{B} ^0_{s}} \rightarrow {{J /\psi }} \phi )}{\mathrm {d}t\,\mathrm {d}\Omega }\propto \sum ^{10}_{k=1} h_k(t)f_k(\Omega ). \end{aligned}$$
(1)

The time-dependent functions \(h_k(t)\) are given as

$$\begin{aligned} h_k(t|{{B} ^0_{s}})= & {} N_k e^{-{\Gamma _{{s}}} t}\left[ a_k\cosh \frac{{\Delta \Gamma _{{s}}} t}{2}+b_k\sinh \frac{{\Delta \Gamma _{{s}}} t}{2}\right. \nonumber \\&\left. +c_k\cos ({\Delta m_{{s}}} t)+d_k\sin ({\Delta m_{{s}}} t)\right] , \end{aligned}$$
(2)
$$\begin{aligned} h_k(t|{{ 0.18em\overline{ -0.18em B}} {}^0_{s}})= & {} \bar{N}_k e^{-{\Gamma _{{s}}} t}\left[ a_k\cosh \frac{{\Delta \Gamma _{{s}}} t}{2}+b_k\sinh \frac{{\Delta \Gamma _{{s}}} t}{2}\right. \nonumber \\&\left. -c_k\cos ({\Delta m_{{s}}} t)-d_k\sin ({\Delta m_{{s}}} t)\right] , \end{aligned}$$
(3)

where \({\Delta \Gamma _{{s}}} \equiv \Gamma _{\mathrm {L}}-\Gamma _{\mathrm {H}}\) is the decay width difference between the light and the heavy \({B} _s\) mass eigenstates, \({\Delta m_{{s}}} \equiv m_{\mathrm {H}}-m_{\mathrm {L}}\) is their mass difference, and \({\Gamma _{{s}}} \equiv (\Gamma _{\mathrm {L}}+\Gamma _{\mathrm {H}})/2\) is their average width. The coefficients \(N_k\) (\(\bar{N}_k\)) and \(a_k, b_k, c_k, d_k\) can be expressed in terms of \(\phi _{{s}}\) and four complex transversity amplitudes \(A_i\) (\(\bar{A}_i\)) at \(t = 0\), as detailed in Table 1. The label i takes the values \(\{\perp , \parallel , 0\}\) for the three P-wave amplitudes and S for the S-wave amplitude. The amplitudes are parameterised by \(|A_i|e^{i\delta _i}\) with the conventions \(\delta _0 = 0\) and \(|A_{\perp }|^2 + |A_0|^2 + |A_{\parallel }|^2 = 1\). The S-wave fraction is defined as \(F_\mathrm {S} = |A_\mathrm {S}|^2/(|A_\mathrm {S}|^2 + |A_{\perp }|^2 + |A_0|^2 + |A_{\parallel }|^2)\). In contrast to Ref. [5], the S-wave parameters are measured in a single range of \(m({{K} ^+} {{K} ^-})\) within \(\pm 30\text {\,Me V\!/}c^2 \) of the known \(\phi \) mass [15]. For a particles produced in a \({B} ^0_{s} \) and \({ 0.18em\overline{ -0.18em B}} {}^0_{s} \) flavour eigenstates, the coefficients in Eqs. (2) and (3), respectively are given in Table 1 together with the angular functions \(f_k(\Omega )\), where the S, D, C coefficients are defined as

$$\begin{aligned}&\displaystyle S =-\frac{2|\lambda |}{1+|\lambda |^2}\sin ({\phi _{{s}}}),\quad D=-\frac{2|\lambda |}{1+|\lambda |^2}\cos ({\phi _{{s}}})\quad \mathrm {and}\nonumber \\&\displaystyle C=\frac{1-|\lambda |^2}{1+|\lambda |^2}. \end{aligned}$$
(4)

The parameter \(\lambda \) is related to \(C\!P\) violation in the interference between mixing and decay, and is defined by \(\lambda =\eta _i(q/p)(\bar{A}_i/A_i)\) where the polarisation states i have the \(C\!P\) eigenvalue \(\eta _i=+1\) for \(i\in \{0,\parallel \}\) and \(\eta _i=-1\) for \(i\in \{\perp ,\mathrm {S}\}\). The complex parameters p and q relate the mass eigenstates to the flavour eigenstates, \(|B_{\mathrm {L,H}}\rangle =p|{{B} ^0_{s}} \rangle \pm q|{{ 0.18em\overline{ -0.18em B}} {}^0_{s}} \rangle \). The \(C\!P\)-violating phase is defined by \({\phi _{{s}}} \equiv -\arg (\lambda )\) and is assumed here to be the same for all polarisation states. The value of \(|\lambda |\) equals unity in the absence of \(C\!P\) violation in decay [24,25,26]. In this paper, the \(C\!P\) violation in \({B} _s\) meson mixing is assumed to be negligible, following the measurements in Refs. [27, 28].

Table 1 Definition of angular and time-dependent functions for \({B} ^0_{s} \) and \({ 0.18em\overline{ -0.18em B}} {}^0_{s} \) mesons

3 Detector, data set and selection

The LHCb detector [29, 30] is a single-arm forward spectrometer covering the pseudorapidity range \(2<\eta <5\), designed for the study of particles containing \(b \) or \(c \) quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region, a large area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of momentum, p, of charged particles with a relative uncertainty that varies from \(0.5\%\) at low momentum to \(1.0\%\) at \(200\text {\,Ge V\!/}c \). The minimum distance of a track to a primary pp collision vertex (PV), the impact parameter (IP), is measured with a resolution of \((15+29/p_{\mathrm {T}})\,\upmu \text {m} \), where \(p_{\mathrm {T}}\) is the component of the momentum transverse to the beam in \(\text {\,Ge V\!/}c\). Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors (RICH). Photons, electrons, and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter (ECAL), and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers.

Samples of simulated events are used to optimise the signal selection, to derive the angular efficiency and to correct the decay-time efficiency. The simulated pp collisions are generated using Pythia  [31, 32] with a specific LHCb configuration [33]. The decays of hadronic particles are described by EvtGen  [34], in which final-state radiation is generated using Photos  [35]. The interaction of the generated particles with the detector and its response are implemented using Geant4 toolkit [36, 37], as described in Ref. [38].

The online candidate selection is performed by a trigger [39], which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full decay reconstruction. At the hardware stage, events are required to have a hadron or electron with a high transverse-energy deposit in the calorimeters, \(E_{\mathrm {T}} >3\text {\,Ge V} \) and \(E_{\mathrm {T}} >3.68\text {\,Ge V} \), respectively. The subsequent software trigger is implemented as two separate levels that further reduce the event rate. The first level is designed to select decays which are displaced from all PVs. At the second level, \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} \phi \) candidates are selected by identifying events containing a pair of oppositely charged kaons with an invariant mass within \(\pm 30\text {\,Me V\!/}c^2 \) of the known \(\phi \)-meson mass [15] or by using topological b-hadron triggers. These topological triggers require a two-, three- or four-track secondary vertex with a large sum of the \(p_{\mathrm {T}}\) of the charged particles and significant displacement from all PVs. A multivariate algorithm [40] is used for the identification of secondary vertices consistent with the decay of a b hadron. The trigger signals are associated with reconstructed particles in the offline selection. The candidate selection is devised in order to minimise the impact on the decay-time efficiency.

Electrons radiate bremsstrahlung photons when travelling through the detector material. For events where the photons are emitted upstream of the spectrometer magnet, the photon and the electron deposit their energy in different ECAL cells, and the electron momentum measured by the tracking system is underestimated. Neutral energy deposits in the ECAL compatible with being emitted by the electron are used to correct for this effect. The limitations of the recovery technique degrade the resolution of the reconstructed invariant masses of both the di-electron pair and the \({B} ^0_{s} \) candidate [19].

In the offline selection, \({J /\psi }\) candidates are formed from two oppositely charged tracks identified as electrons, and \(\phi \) candidates from pairs of oppositely charged tracks identified as kaons. The pairs of tracks need to form a good quality vertex. The electron candidates are required to have \(p_{\mathrm {T}} >0.5\text {\,Ge V\!/}c \) and di-electron invariant mass \(m({e ^+e ^-})\in [2.5,3.3]\text {\,Ge V\!/}c^2 \), where a wider range compared to the dimuon mode analysis is chosen to account for the radiative tail arising due to bremsstrahlung. The \(p_{\mathrm {T}}\) of the \(\phi \) candidate is required to be larger than \(1\text {\,Ge V\!/}c \).

The \({J /\psi }\) and \(\phi \) candidates that are consistent with originating from a common vertex are combined to form \({B} ^0_{s} \) candidates. The mass of the \({B} ^0_{s} \) candidates is required to be in the range \(m({e ^+e ^-} {{K} ^+} {{K} ^-})\in [4.7,5.6]\text {\,Ge V\!/}c^2 \). The reconstructed decay time of the \({B} ^0_{s} \) candidate, t, is obtained from a kinematic fit with the \({J /\psi }\) mass constrained to its known value [15] and the \({B} ^0_{s} \) candidate constrained to originate from the associated PV. Each \({B} ^0_{s} \) candidate is associated with the PV that yields the smallest \(\chi ^2_{\text {IP}}\), where \(\chi ^2_{\text {IP}}\) is defined as the difference in the vertex-fit \(\chi ^2\) of a given PV reconstructed with and without the particle under consideration. The \({B} ^0_{s} \) candidates are selected if they have decay times in the range \(0.3<t<14\text {\,ps} \) and decay-time uncertainty estimates \(\sigma _t<0.12\text {\,ps} \). The fraction of events containing more than one \({B} ^0_{s} \) candidate within the \(m({e ^+e ^-} {{K} ^+} {{K} ^-})\) range is \(2.6\%\). All candidates are retained in the subsequent analysis. The impact of allowing multiple candidates per event is negligible.

The main sources of background are partially reconstructed b-hadron decays and combinatorial background. The first of these arises from the \({{B} ^0_{s}} \!\rightarrow {\chi _{{c} 1}} (1P)( {\rightarrow {{J /\psi }} {\gamma })\phi }\) and \({{B} ^0_{s}} \!\rightarrow {\psi {(2S)}} ( {\rightarrow {{J /\psi }} ~X)\phi }\) decay.Footnote 2 The combinatorial background is due to random combination of tracks in the event that pass the candidate selection. In addition, possible background contributions to the signal region originate from and decays, where the proton or the \({\pi } ^-\) meson from the \({{K} ^*} (892)\rightarrow {{K} ^+} {{\pi } ^-} \) decay is misidentified as a \({K} ^+\) or \({K} ^-\) meson, respectively.

The combinatorial background is suppressed using a boosted decision tree (BDT) [41, 42] analysis, trained using the TMVA toolkit [43, 44]. The BDT discriminant is trained using a signal sample of simulated decays, and a sample of background from data. For the background same-sign combinations of electron and/or kaon pairs are chosen with the same selection criteria as for signal. The simulation is corrected to match the distributions observed in data for variables used in the identification of electrons and kaons. The eight variables used for the training of the BDT discriminant are the transverse momenta of the \({J /\psi }\) and \(\phi \) candidates, the vertex \(\chi ^2\) of the \({B} ^0_{s} \) candidate, the \(\chi ^2\) of the kinematic fit of the \({B} ^0_{s} \) candidate with the \({J /\psi }\) mass constrained to its known value and the electron and kaon identification probability as provided mainly from the RICH and calorimeter systems. The optimal working point for the BDT discriminant is determined using a figure of merit that optimises the statistical power of the selected data sample for the analysis of \(\phi _{{s}}\) by taking the number of signal and background candidates into account [45].

The candidates are rejected if the \({K} ^+\) candidate can also be identified as a proton by a dedicated neural network [46] to suppress any possible contamination from decays. The remaining misidentified background contribution is estimated using simulated samples and amounts to \(1\%\) of the expected signal yield for decays and is negligible for decays.

Figure 2 shows the distribution of \(m({e ^+e ^-} {{K} ^+} {{K} ^-})\) for the selected candidates. In order to describe better the left tail of the \(m({e ^+e ^-} {{K} ^+} {{K} ^-})\) distribution, the sample is split into three categories by the number of electron candidates: zero, one or both electrons of the pair that received bremsstrahlung corrections. An extended maximum-likelihood fit is made to the unbinned \(m({e ^+e ^-} {{K} ^+} {{K} ^-})\) distribution.

Fig. 2
figure 2

Distribution of \(m({e ^+e ^-} {{K} ^+} {{K} ^-})\) for selected candidates divided into three categories: a zero, b one and c both electrons with bremsstrahlung correction. The blue solid line shows the total fit which is composed of (red short-dashed line) the signal and the background contributions. The combinatorial background is indicated by the green long-dashed line while the partially reconstructed background from the and decays are indicated by pink and purple dash-dotted lines, respectively

In the fit the signal component is described by the sum of two Crystal Ball (CB) functions  [47] and the combinatorial background by an exponential function. The partially reconstructed background components from and decays are modelled using a Gaussian function and the sum of two Gaussian functions, respectively. The parameters that describe the shape of the signal candidates and the partially reconstructed background are fixed to values obtained from simulation. The core widths and the common mean of the CB functions are left free in the fit. The fit to the three categories gives a yield of \((1.27\pm 0.05)\times 10^4\) signal candidates where the uncertainty is statistical only.

The fit results are used to assign per-candidate weights via the sPlot technique with \(m({e ^+e ^-} {{K} ^+} {{K} ^-})\) as the discriminating variable [48]. This is used to subtract the background contribution in the maximum-likelihood fit described in Sect. 6. As the three categories are statistically independent further steps of the analysis are performed on the combined sample.

4 Detector resolution and efficiency

The finite decay-time resolution is a diluting factor that will affect the relative precision of \({\phi _{{s}}} \) and has to be accounted for. The way this is introduced into the analysis is described in Sect. 6. The assumed decay-time resolution model, \(\mathcal {R}\), consists of a sum of two Gaussian distributions with their widths depending on the per-candidate decay-time uncertainty determined by the vertex fit as detailed in Ref. [18]. The parameters of this model are loosely constrained in the fit of the decay to the values determined using an identical model from a sample of candidates produced at the PV. They are allowed to vary within a Gaussian constraint of twice the difference of their values between the electron and muon modes as extracted from simulation. The loose constraint was selected to minimise reliance of the analysis on simulations, increasing further the allowed variation does not impact the results. The parameters are determined from the unbinned maximum-likelihood fit, as described in Sect. 6. Taking into account the \(\sigma _t\) distribution of the \({B} ^0_{s} \) signal, the resulting effective resolution is \(45.6\pm 0.5\text {\,fs}\).

Due to the displacement requirements made on signal tracks in the trigger and offline selections, the reconstruction efficiency depends on the decay time of the \({B} ^0_{s} \) candidate. The efficiency is determined with the same method as described in Ref. [8], by using the control channel , with and decays.

The decay-time dependence of the signal efficiency is determined as

$$\begin{aligned} {\varepsilon } ^{{{B} ^0_{s}}}_{\mathrm {data}}(t) = {\varepsilon } ^{{{B} ^0}}_{\mathrm {data}}(t)\times \frac{{\varepsilon } ^{{{B} ^0_{s}}}_{\mathrm {sim}}(t)}{{\varepsilon } ^{{{B} ^0}}_{\mathrm {sim}}(t)}, \end{aligned}$$
(5)

where \({\varepsilon } ^{{{B} ^0}}_{\mathrm {data}}(t)\) is the efficiency of the control channel, determined on data, and \({\varepsilon } ^{{{B} ^0_{s}}}_{\mathrm {sim}}(t)/{\varepsilon } ^{{{B} ^0}}_{\mathrm {sim}}(t)\) is the ratio of efficiencies of the simulated signal and control modes after the selection. The efficiencies are extracted by normalisation to the known lifetimes of \(\tau _{{{B} ^0_{s}}}=1.527\pm 0.011\text {\,ps} \) and \(\tau _{{{B} ^0}}=1.520\pm 0.004\text {\,ps} \) [15]. The second term accounts for the small differences in the decay time and kinematics between the signal and the control modes. The control channel efficiency is defined as \({\varepsilon } ^{{{B} ^0}}_{\mathrm {data}}(t)=N^{{{B} ^0}}_{\mathrm {data}}(t)/N^{{{B} ^0}}_{\mathrm {gen}}(t)\) where \(N^{{{B} ^0}}_{\mathrm {data}}(t)\) is the number of the \({{B} ^0} \!\rightarrow {{J /\psi }} {{K} ^*} (892)^0\) decays in a given time bin as determined using sPlot technique [48] with \(m({e ^+e ^-} {{K} ^+} {{\pi } ^-})\) as discriminating variable. The \(N^{{{B} ^0}}_{\mathrm {gen}}(t)\) is the number of events generated from an exponential distribution with lifetime \(\tau _{{{B} ^0}}\) [15]. The analysis is not sensitive to the absolute scale of the efficiency.

The \({{B} ^0} \!\rightarrow {{J /\psi }} {{K} ^*} (892)^0\) decay is selected using trigger, selection and BDT requirements similar to those used for the signal, adapted to the different final states. The background contribution to the control sample from the misidentification of final-state particles from the decay is estimated to be \(0.06\%\) of the expected signal yield, while the background contribution from decays is negligible.

The \(m({e ^+e ^-} {{K} ^+} {{\pi } ^-})\) invariant-mass distribution is shown in Fig. 3 divided into the three bremsstrahlung categories, as for the signal sample. The contribution from decays is described by the sum of two CB functions while an exponential function is used to describe the combinatorial background. Similarly to the signal sample, partially reconstructed background arises from \({B} ^0\) decays where one or more particles are not reconstructed; background components stemming from \({{B} ^0} \!\rightarrow {\chi _{{c} 1}} (1P)( {\rightarrow {{J /\psi }} {\gamma }){{K} ^*} (892)^0}\), \({{B} ^0} \!\rightarrow {\psi {(2S)}} ( {\rightarrow {{J /\psi }} ~X)}{{K} ^*} (892)^0\) and \({{B} ^0} \!\rightarrow {{J /\psi }} K_1(1270)^0( {\rightarrow {{K} ^*} (892)^0{{\pi } ^0})}\) decays are described using a single Gaussian function, the sum of two Gaussian functions and the sum of two CB functions, respectively. The yield is found to be \((5.45\pm 0.05)\times 10^4\) signal candidates.

Fig. 3
figure 3

Distribution of \(m({e ^+e ^-} {{K} ^+} {{\pi } ^-})\) for selected candidates divided into three categories: a zero, b one and c both electrons with bremsstrahlung correction. The blue solid line shows the total fit which is composed of (red short-dashed line) the signal and the background contributions. The combinatorial background is indicated by the green long-dashed line while the partially reconstructed background from the , and decays are indicated by pink, purple and yellow dash-dotted lines, respectively

The decay-time efficiency for the signal is shown in Fig. 4. The efficiency is relatively uniform at high values of decay time but decreases at low decay times due to the selection criteria that require displaced tracks.

Fig. 4
figure 4

Signal efficiency as a function of the decay time, \({\varepsilon } ^{{{B} ^0_{s}}}_{\mathrm {data}}(t)\), scaled by the average efficiency

The efficiency as a function of the helicity angles is not uniform due to the forward geometry of the LHCb detector and the requirements imposed on the final-state particle momenta. Projections of the three-dimensional efficiency, \({\varepsilon } (\Omega )\), to the three helicity angles are shown in Fig. 5. The angular efficiency correction is introduced in the analysis through normalisation integrals in the probability density function describing the signal decays in the fit described in Sect. 6. The integrals given in Table 2 are calculated using simulated candidates that are subject to the same trigger and selection criteria as the data, following the same technique as in Ref. [22]. The relative efficiency is constant for the azimuthal angle \(\phi _h\). A dependence of up to \(15\%\) is observed for \(\cos \theta _e\) and \(\cos \theta _K\). The finite angular resolution has small impact on the results of the analysis and is neglected. A systematic uncertainty is assigned to account for this effect.

Fig. 5
figure 5

Efficiency projected onto (left) \(\cos \theta _K\), (middle) \(\cos \theta _e\) and (right) \(\phi _h\) obtained from a simulated \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} \phi \) sample, scaled by the average efficiency

5 Flavour tagging

The \({B} _s\) candidate flavour at production is determined by two independent categories of flavour tagging algorithms, the opposite-side (OS) taggers [49] and the same-side kaon (SSK) tagger [50], which exploit specific features of the production of \(b \) \(\overline{{b}}\) quark pairs in pp collisions, and their subsequent hadronisation. Each tagging algorithm assigns a tag decision and a mistag probability. The tag decision, \(\mathfrak {q}\), takes values \(+1\), \(-1\), or 0, if the signal candidate is tagged as \({B} ^0_{s} \), \({ 0.18em\overline{ -0.18em B}} {}^0_{s} \), or is untagged, respectively. The fraction of events in the sample with a nonzero tagging decision gives the efficiency of the tagger, \(\varepsilon _{\mathrm {tag}}\). The mistag probability, \(\eta \), is estimated event-by-event, and represents the probability that the algorithm assigns a wrong tag decision. It is calibrated using data samples of two flavour specific decays, for the OS taggers and for the SSK tagger, resulting in a corrected mistag probability, \(\omega \) \((\bar{\omega })\), for a candidate with initial flavour \({B} ^0_{s} \) (\({ 0.18em\overline{ -0.18em B}} {}^0_{s} \)). In case of the SSK algorithm, the calibrated sample of decays is weighted to match the kinematics of the signal decays. A linear relationship between \(\eta \) and \(\omega \) is used for the calibration. The effective tagging power is given by \({\varepsilon _{\mathrm {tag}}} (1-2\omega )^2\) and for the combined taggers in the \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} (\) \(e ^+e ^-\))\(\phi \) signal sample a value of \((5.07\pm 0.16)\%\) is obtained.

Table 2 Angular acceptance integrals for the simulated sample. The \(I_k\) integrals are normalised with respect to the \(I_0\) integral

6 Maximum-likelihood fit and results

The \(C\!P\) observables are determined by an unbinned maximum-likelihood fit to the background-subtracted candidates in four-dimensions, namely the \({B} ^0_{s} \) decay time and the three helicity angles, with a probability density function (PDF) describing \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} (\) \(e ^+e ^-\))\(\phi \) signal decay. The negative log-likelihood function to be minimised is given by

$$\begin{aligned} -\ln \mathcal {L} = -\alpha \sum _{i=1}^{\mathrm {N}} w_i\ln \mathcal {P}, \end{aligned}$$
(6)

where N is the total number of candidates. The \(w_i\) coefficients are the sPlot weights [48] computed using \(m({e ^+e ^-} {{K} ^+} {{K} ^-})\) as discriminating variable, and the factor \(\alpha =\sum w_i/\sum w^2_i\) is used to account for the correct signal yield in the sample. The PDF, \(\mathcal {P}=\mathcal {S}/\int \mathcal {S}\mathrm {d}t\,\mathrm {d}\Omega \), is normalised over the four-dimensional space where

$$\begin{aligned}&\mathcal {S}(t,\Omega ,\mathfrak {q}^{\mathrm {OS}},\mathfrak {q}^{\mathrm {SSK}}|\eta ^{\mathrm {OS}}, \eta ^{\mathrm {SSK}})\nonumber \\&\quad = \mathcal {T}(t',\Omega ,\mathfrak {q}^{\mathrm {OS}},\mathfrak {q}^{\mathrm {SSK}}| \eta ^{\mathrm {OS}},\eta ^{\mathrm {SSK}})\otimes \mathcal {R}(t-t'|\sigma _t)\varepsilon ^{{{B} ^0_{s}}}_{\mathrm {data}}(t), \end{aligned}$$
(7)

with the decay-time resolution function, \(\mathcal {R}\), defined in Sect. 4 and

$$\begin{aligned}&\mathcal {T}(t',\Omega ,\mathfrak {q}^{\mathrm {OS}},\mathfrak {q}^{\mathrm {SSK}}|\eta ^{\mathrm {OS}},\eta ^{\mathrm {SSK}}) \nonumber \\&\quad = \left( 1+\mathfrak {q}^{\mathrm {OS}}(1-2\omega ^{\mathrm {OS}})\right) \left( 1+\mathfrak {q}^{\mathrm {SSK}}(1-2\omega ^{\mathrm {SSK}})\right) \nonumber \\&\qquad \times G(t,\Omega )+ \left( 1-\mathfrak {q}^{\mathrm {OS}}(1-2\bar{\omega }^{\mathrm {OS}})\right) \nonumber \\&\qquad \times \left( 1-\mathfrak {q}^{\mathrm {SSK}}(1-2\bar{\omega }^{\mathrm {SSK}})\right) \bar{G}(t,\Omega ), \end{aligned}$$
(8)

which allows for the inclusion of the information from both tagging algorithms in the computation of the decay rate. The function \(G(t,\Omega )\) is defined in Eq. (1) and \(\bar{G}(t,\Omega )\) is the corresponding function for \({ 0.18em\overline{ -0.18em B}} {}^0_{s} \) decays. The angular efficiency is included in the normalisation of the PDF via the ten integrals, \(I_k=\int \mathrm {d}\Omega \,\varepsilon (\Omega )f_k(\Omega )\). The integrals are pre-calculated using simulation as described in Sect. 4.

When using weights from the sPlot method, the standard uncertainty estimate based on the Hessian matrix will generally not give asymptotically correct confidence intervals [51]. A bootstrap method [52] is used to obtain a correct estimate of the statistical uncertainty. The weights are recalculated for each bootstrap sample. In the fit, Gaussian constraints are included for certain nuisance parameters, namely the mixing frequency \({\Delta m_{{s}}} =17.757\pm 0.021\text {\,ps} ^{-1} \) [15], the tagging calibration parameters, and the time resolution parameters. The fitting procedure is validated using pseudoexperiments and simulated \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} ( {{e ^+e ^-})\phi }\) decays.

The results of the fit to the data are shown in Table 3 while the projections of the fit results on the decay time and helicity-angle distributions are reported in Fig. 6. The correlation matrix of statistical uncertainties is reported in Table 5 of Appendix A. The results are consistent with previous measurements of these parameters [5, 10,11,12,13,14], and the SM predictions for \(\phi _{{s}}\)  [24,25,26]. They show no evidence of \(C\!P\) violation in the interference between \({B} ^0_{s} \) meson mixing and decay, nor for direct \(C\!P\) violation in \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} (\) \(e ^+e ^-\))\(\phi \) decays, as the parameter \(|\lambda |\) is consistent with unity within uncertainties.

Table 3 Results of the maximum-likelihood fit, described in Sect. 6, to the \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} (\) \(e ^+e ^-\))\(\phi \) decays including all acceptance and resolution effects. The first uncertainty is statistical and the second is systematic
Fig. 6
figure 6

Decay time and helicity-angle distributions for (data points) \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} (\) \(e ^+e ^-\))\(\phi \) decays with the one-dimensional projections of the PDF extracted in the maximum-likelihood fit. The solid blue line shows the total signal contribution, which is composed of (long-dashed red) \(C\!P\)-even, (short-dashed green) \(C\!P\)-odd and (dash-dotted purple) S-wave contributions

7 Systematic uncertainties

Systematic uncertainties for each of the measured parameters are reported in Table 4. They are evaluated by observing the change in the physics parameters after repeating the likelihood fit with a modified model assumption, or through pseudoexperiments, in case of uncertainties originating from the limited size of calibration samples.

The decay-time and angular efficiencies obtained independently in the three bremsstrahlung categories are compatible within statistical uncertainties. While the effective decay-time resolution differs for the three categories, it was verified with simulations that the result of a weighted average of three independent maximum-likelihood fits is consistent with the default one.

Repeating the mass fit in bins of the decay time and helicity angles shows that the mass resolution depends on \(\cos \theta _e\) and \(\cos \theta _K\). As the sPlot technique assumes that the discriminating variable is independent of the observables of interest, the effect of this correlation is quantified. The data sample is divided in intervals of \(\cos \theta _e\) and \(\cos \theta _K\) and new weights are computed with fits to \(m({e ^+e ^-} {{K} ^+} {{K} ^-})\). The four-dimensional likelihood fit is evaluated with modified weights. The variation of each physics parameter is assigned as a systematic uncertainty. For the decay time and azimuthal \(\phi _h\) angle the effect is negligible.

The mass model is tested in two ways. First new sets of weights are computed using alternate PDF models. One set with the signal component of the \(m({e ^+e ^-} {{K} ^+} {{K} ^-})\) distribution described by a sum of two Ipatia functions [53]. Second set with the combinatorial background described by a second order Chebyshev polynomial. Third set with the combinatorial background described by an exponential function with slope fixed to an average value from samples with one and both electrons corrected for bremsstrahlung. For the second test a set of pseudoexperiments is used by fluctuating the default mass model parameters within their uncertainties (accounting for correlations), providing a new set of weights. The width of the obtained physics parameters distributions from the pseudoexperiments or the difference between the default and alternate PDF results is assigned as systematic uncertainty, whichever is larger.

Table 4 Statistical and systematic uncertainties. A dash corresponds to systematic uncertainties that are negligible. Systematic uncertainties from different sources are added in quadrature

The statistical uncertainty on the angular efficiency is propagated by repeating the fit using new sets of the ten integrals, \(I_k\), systematically varied according to their covariance matrix. The width of the obtained distributions for each physics parameter is taken as the systematic uncertainty. The angular resolution is neglected in the maximum-likelihood fit. The effect of this assumption is studied using pseudoexperiments, where the helicity angles are smeared according to the experimental resolution. There is a small effect on the polarisation amplitudes, strong phase and decay width difference while all other parameters are unaffected.

A systematic contribution is evaluated to take into account the effect of the finite decay-time resolution by comparing pseudoexperiments with fixed and constrained decay-time resolution parameters. A sample of pseudoexperiments with the four-dimensional \({{B} ^0_{s}} \rightarrow {{J /\psi }} ({e ^+e ^-})\phi \) PDF including time and angular efficiencies is used. The procedure is evaluated for two scenarios: the former with decay-time resolution parameters fixed to generated values, and the latter with parameters constrained to twice the difference between values obtained from signal simulation with \({{J /\psi }} \!\rightarrow {e ^+e ^-} \) and \({{J /\psi }} \!\rightarrow {\mu ^+\mu ^-} \) decays. The quadratic difference between the uncertainties of pseudoexperiments with fixed and constrained parameters is assigned as a systematic uncertainty. In addition tests with decay-time resolution parameters fixed in the fit to the data sample are performed. The parameters are fixed to values obtained from the time angle fit at \({\phi _{{s}}} \) value fixed to 0 or \(\pi /2\), or to values from a sample of \({{J /\psi }} \!\rightarrow {\mu ^+\mu ^-} \) candidates produced at the PV corrected for the difference between \(e ^+e ^-\) and \(\mu ^+\mu ^-\) simulation samples. The test results are compatible within statistical uncertainties to the default fit results.

The decay-time efficiency introduces a systematic uncertainty from three different sources. First, the contribution due to the statistical uncertainty on the determination of the decay-time efficiency from the control channel is obtained by evaluating the fit multiple times after randomly varying the parameters of the time efficiency within their statistical uncertainties. The statistical uncertainty is dominated by the size of the \({{B} ^0} \!\rightarrow {{J /\psi }} {{K} ^*} (892)^0\) control sample. Second, a sum of two Ipatia functions is used as an alternative mass model for the \(m({e ^+e ^-} {{K} ^+} {{\pi } ^-})\) distribution and a new decay-time efficiency function is produced. Finally, the efficiency function is computed with the \({B} ^0\) lifetime modified by \(\pm 1\sigma \). In all cases the difference in the fit results arising from the use of the new efficiency function is taken as a systematic uncertainty.

The sensitivity to the BDT selection is studied by adjusting the working point around the optimal position for the signal channel where the difference of the number of signal candidates is within \(10\%\) between the default and varied BDT criteria. The effect of applying the modified BDT requirement in the likelihood fit is studied using pseudoexperiments. The mass model parameters for each BDT requirement are varied within their uncertainties (accounting for correlations) and the weights are re-evaluated based on the alternative model. The fit is repeated using a new set of weights and a new efficiency function. The observed variations in the physics parameters are compatible with statistical fluctuations. This is verified by pseudoexperiments with \(10\%\) of candidates removed at random.

A systematic uncertainty is assigned to account for the differences in the final-state kinematics between data and simulated samples. The simulated signal events are weighted using a multidimensional BDT-based algorithm [54] in six dimensions corresponding to kinematic variables with largest observed discrepancies between data and simulations. The procedure is repeated for the control sample \({{B} ^0} \rightarrow {{J /\psi }} ({e ^+e ^-}){{K} ^*} (892)^0\). The reweighted simulation samples of both channels are used to obtain new angular and decay-time acceptances. The difference with the default fit result is assigned as a systematic uncertainty.

The fraction of candidates contributing to the signal sample is estimated to be \(1\%\) using simulation. The impact of neglecting this contribution is evaluated for the data sample by fitting the \(m({e ^+e ^-} {{K} ^+} {{K} ^-})\) distribution with an additional component to account for, namely the sum of two CB functions, the shape of which is fixed to a fit to simulated candidates. In addition, the decay-time efficiency is redetermined including a component for background from decays. This component is modelled by the sum of two CB functions, the shape of which is fixed to a fit to simulated candidates. The fraction of the decays is estimated from the simulation to be at most \(0.06\%\) [55]. The differences of physics parameters obtained from the fit with modified weights and efficiency function is assigned as a systematic uncertainty.

A small fraction of decays comes from the decays of \({B} _{c} ^+\) mesons. The fraction is estimated as \(0.8\%\) in Ref. [56] and pseudoexperiments are used to assess the impact of ignoring such a contribution on the extraction of the physics parameters. Only \(\Gamma _{{s}}\) is observed to be affected, with a bias on its central value corresponding to \(20\%\) of the statistical uncertainty, which is assigned as a systematic uncertainty.

A possible bias in the fitting procedure is investigated through many pseudoexperiments of equivalent size to the data sample. For each pseudoexperiment the physics parameters are fluctuated in the underlying PDF and then compared to the obtained fit results. The resulting deviations are small and those that are not compatible with zero within three standard deviations are quoted as systematic uncertainties.

Inclusion of a result with a constraint on the \({\Delta m_{{s}}} \) into a global analysis leads to troublesome treatment of systematic effects introduced by choice of the constraint. Therefore we provide a result with the mixing frequency fixed to the PDG value, \({\Delta m_{{s}}} =17.757\text {\,ps} ^{-1} \) [15], as reported in Appendix B. No significant difference is observed with respect to the default result.

The systematic uncertainties associated to the mass model and mass factorisation can be treated as uncorrelated between this result and that of Ref. [18]. More details on the systematic effects for the studied channel are given in Ref. [57].

8 Conclusion

Using a data set corresponding to an integrated luminosity of \(3 \text {\,fb} ^{-1} \) collected by the LHCb experiment in pp collisions at centre-of-mass energies of 7 and \(8\text {\,Te V} \), a flavour-tagged decay-time-dependent angular analysis of \((1.27\pm 0.05)\times 10^4\) decays is performed. A number of physics parameters including the \(C\!P\)-violating phase \(\phi _{{s}}\), average decay width \(\Gamma _{{s}}\) and decay width difference \(\Delta \Gamma _{{s}}\) as well as the polarisation amplitudes and strong phases of the decay are determined. The effective decay-time resolution and effective tagging power are \(45.6\pm 0.1\text {\,fs}\) and \((5.07\pm 0.16)\%\), respectively. The \(C\!P\) parameters are measured to be

$$\begin{aligned} \begin{aligned}&\displaystyle {\phi _{{s}}} = 0.00\pm 0.28\pm 0.07\text {\,rad},\\&\displaystyle {\Delta \Gamma _{{s}}} = 0.115\pm 0.045\pm 0.011\text {\,ps} ^{-1},\\&\displaystyle {\Gamma _{{s}}} = 0.608\pm 0.018\pm 0.012\text {\,ps} ^{-1} \end{aligned} \end{aligned}$$
Table 5 Correlation matrix of statistical uncertainties

where the first uncertainty is statistical and the second is systematic. The dominant sources of the systematic uncertainty are the imperfect mass and decay-time resolution models. This is the first measurement of the \(C\!P\) content of the \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} (\) \(e ^+e ^-\))\(\phi \) decay and first time that \(\phi _{{s}}\) has been measured in the final state containing electrons. These results constitute an important check for the results with muons in the final state because the systematic uncertainties of the measurements are independent, while the studied mechanism of the \(C\!P\) violation is the same. The results are consistent with previous measurements [5, 10,11,12,13,14], the SM predictions [24,25,26], and show no evidence of \(C\!P\) violation in the interference between \({B} ^0_{s} \) meson mixing and decay. In addition, no evidence for direct \(C\!P\) violation in \({{B} ^0_{s}} \!\rightarrow {{J /\psi }} (\) \(e ^+e ^-\))\(\phi \) decays is observed.