Modeling the Multiwavelength Variability of Mrk 335 Using Gaussian Processes

Ryan-Rhys Griffiths; Jiachen Jiang; Douglas J. K. Buisson; Dan Wilkins; Luigi C. Gallo; Adam Ingram; Alpha A. Lee; Dirk Grupe; Erin Kara; Michael L. Parker; William Alston; Anthony Bourached; George Cann; Andrew Young; S. Komossa

doi:10.3847/1538-4357/abfa9f

1. Introduction

Active galactic nuclei (AGNs) show strong and variable emission across multiple wavelengths. The UV emission from an AGN is believed to be dominated by thermal emission from an accretion disk close to the central supermassive black hole (SMBH; e.g., Pringle 1981). The variability of optical and UV AGN¹⁶ emission is stochastic and described by random Gaussian fluctuations (e.g., Welsh et al. 2011; Gezari et al. 2013; Zhu et al. 2016; Sánchez-Sáez et al. 2018; Smith et al. 2018; Xin et al. 2020) with the autocorrelation functions of such fluctuations adhering to the "damped random walk" model. The X-ray emission from an AGN is often found to show faster variability relative to emission at longer wavelengths (e.g., Mushotzky et al. 1993; Gaskell & Klimek 2003) and originates from a more compact region (e.g., Morgan et al. 2008; Chartas et al. 2017).

The relationship between the UV and X-ray emission has been well studied. For instance, correlations between the variability in two energy bands have been seen in some individual sources (e.g., Shemmer et al. 2001; Buisson et al. 2017) while others do not show significant evidence for similar correlation (e.g., Smith & Vaughan 2007; Buisson et al. 2018). In sources where correlation is found, lags that are related to the light travel time between two emission regions are frequently observed. These lags are often found to be on timescales of days and are longer than those predicted by classical disk theories (Shakura & Sunyaev 1973). Such lag amplitudes indicate a disk of size a few times larger than expected (e.g., Edelson et al. 2000; Shappee et al. 2014; Troyer et al. 2016; Buisson et al. 2017). Alternatively, some modified models have been proposed for the underestimation of lags by the classical thin disk model, e.g., disk turbulence (e.g., Cai et al. 2020), additional varying FUV illumination (e.g., Gardner & Done 2017), a tilted or inhomogeneous inner disk (e.g., Dexter & Fragile 2011; Starkey et al. 2017), or an extended coronal region (e.g., Kammoun et al. 2021). Much shorter lags, e.g., hundreds of seconds, in agreement with the Shakura & Sunyaev (1973) model, have been rarely observed by comparison (e.g., in NGC 4395; McHardy et al. 2016).

The Neil Gehrels Swift Observatory has been monitoring the X-ray sky in the past decade in tandem with simultaneous pointings in the optical and UV band. In this work, we focus on the X-ray and UVW2 (λ = 212 nm) lightcurves of the narrow-line Seyfert 1 galaxy (NLS1; e.g., Gallo 2018) Mrk 335 obtained by XRT and UVOT, the soft X-ray and UV/optical telescopes on Swift. Mrk 335 was one of the brightest X-ray sources prior to 2007, before its flux diminished by 10×–50× its original brightness (Grupe et al. 2007). The X-ray brightness has not recovered since. During this low X-ray flux period, the UV brightness remains relatively unchanged rendering Mrk 335 X-ray weak (Tripathi et al. 2020). The behavior has been explained as a possible collapse of the X-ray corona, e.g., Gallo et al. (2013, 2015; Parker et al. 2014), and/or increased absorption in the X-ray-emitting region, e.g., Grupe et al. (2012; Longinotti et al. 2013, 2019; Parker et al. 2019).

Mrk 335 has been continuously monitored since 2007, making it one of the best-studied AGNs with Swift. Previous studies from the Swift monitoring program can be found in Grupe et al. (2007, 2012), Gallo et al. (2018), Tripathi et al. (2020), and Komossa et al. (2020). The X-rays are constantly fluctuating and regularly display large-amplitude flaring, e.g., Wilkins et al. (2015). The UV are significantly variable, but at a much smaller amplitude than the X-rays. Gallo et al. (2018) found tentative evidence for lags of ≈20 days based on cross-correlation analyses, suggesting a potential reprocessing mechanism of the more variable X-ray emission in the UV emitter of this source. One challenge faced by the Swift monitoring program is that the lightcurves are not continuously sampled and hence standard Fourier techniques cannot be applied. This uneven sampling of the lightcurves is imposed by limited telescope time.

In the context of cross-correlation analysis, methods have been developed to address the problem of unevenly sampled lightcurves. In Reynolds (2000), the method of Press et al. (1992) is extended to interpolate the lightcurve gaps using a model of the covariance function, or equivalently the power spectrum, of the lightcurve. In Bond et al. (1998), Miller et al. (2010), and Zoghbi et al. (2013) a maximum likelihood approach is taken to fit models of the lightcurve power spectra, which accounts for the correlation between the lightcurves. In this paper we focus on a relatively new approach to tackle unevenly sampled lightcurves.

Gaussian processes confer a Bayesian nonparametric framework to model general time-series data (Roberts et al. 2013; Tobar et al. 2015) and have proven effective in tasks such as periodicity detection (Durrande et al. 2016) and spectral density estimation (Tobar 2018). More broadly, Gaussian processes (GPs) have recently demonstrated modeling success across a wide range of spatial and temporal application domains including robotics (Deisenroth & Rasmussen 2011; Greeff & Schoellig 2020), Bayesian optimization (Shahriari et al. 2015; Grosnit et al. 2020; Cowen-Rivers et al. 2021; Grosnit et al. 2021), as well as areas of the natural sciences such as molecular machine learning (Nigam et al. 2021; Griffiths & Hernández-Lobato 2020; Moss & Griffiths 2020; Thawani et al. 2020; Griffiths et al. 2021; Hase et al. 2020, Bartók et al. 2010), genetics (Moss et al. 2020), and materials science (Cheng et al. 2020; Zhang et al. 2020). In the context of astrophysics there is a recent trend favoring nonparametric models such as GPs due to the flexibility afforded when specifying the underlying data modeling assumptions. Applications have arisen in lightcurve modeling (Luger et al. 2021a, 2021b), continuous-time autoregressive moving average (CARMA) processes Yu & Richards (2021), modeling stellar activity signals in radial velocity data (Rajpaul et al. 2015), lightcurve detrending (Aigrain et al. 2016), learning imbalances for variable star classification (Lyon et al. 2020), inferring stellar rotation periods (Angus et al. 2018), estimating the dayside temperatures of hot Jupiters (Pass et al. 2019), exoplanet detection (Czekala et al. 2017; Jones et al. 2017; Gordon et al. 2020; Langellier et al. 2021), spectral modeling (Gibson et al. 2012; Nikolov et al. 2018; Diamond-Lowe et al. 2020), as well as blazar variability studies (Karamanavis 2015, 2017; Covino et al. 2020; Yang et al. 2021).

It has recently been demonstrated in lightcurve simulations by Wilkins (2019) that a GP framework can compute time lags associated with X-ray reverberation from the accretion disk that are longer and observed at lower frequencies than can be measured by applying standard Fourier transform techniques to the longest available continuous segments. It is for this principal reason that we choose to employ GPs for our timing analysis. Further desirable facets of GPs include the fact that, unlike parametric models, they do not make strong assumptions about the shape of the underlying lightcurve (Wang et al. 2012). Additionally, we may perform Bayesian model selection at the level of the covariance function or kernel allowing us to quantitatively compare different models of the lightcurve power spectrum. Finally in the cross-correlation analysis, we may make a weaker modeling assumption than (Zoghbi et al. 2013) in treating the X-ray and UV lightcurves as being independent (Wilkins 2019).

The paper is outlined as follows: In Section 2, we provide the background on GPs including discussion of different kernels as well as Bayesian model selection, the criterion used to choose between kernels. In Section 3 we describe the procedures used to fit GPs to the X-ray and UVW2 bands including aspects such as identification of the flux distribution, consideration of measurement noise, as well as a simulation study to determine the appropriate kernels. In Section 4 we compare the structure functions of the gp-interpolated lightcurves with the observational structure functions from Gallo et al. (2018). In Section 5 we present a cross-correlation analysis of the X-ray and UVW2 bands using the gp-interpolated lightcurves. Finally, in Section 6 we provide concluding remarks about the discrepancy between the observational and gp-derived structure functions as well as the implications of the cross-correlation analysis, namely that the broad lag features suggest an extended emission region of the disk in Mrk 335 during the reverberation process. All code for reproducing the analysis is available at https://github.com/Ryan-Rhys/Mrk_335.

2. Gaussian Processes

We may define a GP as a collection of random variables, any finite number of which have a joint Gaussian distribution. When the gp is used as a prior over functions, the aforementioned random variables consist of function values f(t) at different points in time t. In our setting f represents flux or count rate. The gp is characterized by a mean function

$\begin{eqnarray}&&m(t)={\mathbb{E}}[f(t)]\end{eqnarray} \tag{ 1 }$

and a covariance function

$\begin{eqnarray}&&k(t,t^{\prime} )={\mathbb{E}}[(f(t)-m(t))(f(t^{\prime} )-m(t))].\end{eqnarray} \tag{ 2 }$

The process is written as follows:

$\begin{eqnarray}&&f(t)\sim { \mathcal G }{ \mathcal P }\left(m(t),k(t,t^{\prime} )\right).\end{eqnarray} \tag{ 3 }$

The mean function is set to the empirical mean of the standardized observational data in the cases we consider. Standardization, in this case, refers to the common practice of subtracting the mean and dividing by the standard deviation of the data when fitting the gp in order to facilitate the identification of appropriate hyperparameters (Murray 2008). The standardization is reversed once the fitting procedure is complete in order to obtain predictions on the original scale of the data. m(t) = 0 will be assumed henceforth for the sake of the current presentation. The covariance function computes the pairwise covariance between two random variables (function values). In the gp literature, the covariance function is commonly referred to as the kernel and is denoted as

$\begin{eqnarray}&&k(t,t^{\prime} )=\mathrm{cov}\left(f(t),f(t^{\prime} )\right).\end{eqnarray} \tag{ 4 }$

Informally, the kernel is responsible for determining the smoothness of the functions which the gp is capable of fitting. The inductive bias created by the choice of kernel is an important consideration in Gaussian process modeling.

2.1. Kernels

The most widely known kernel is the squared exponential (SE) or radial basis function kernel:

$\begin{eqnarray}&&{k}_{\mathrm{SQE}}(t,t^{\prime} )={\sigma }_{f}^{2}\exp \left(\displaystyle \frac{-{\left|t-t^{\prime} \right|}^{2}}{2{{\ell }}^{2}}\right),\end{eqnarray} \tag{ 5 }$

where ${\sigma }_{f}^{2}$ is the signal amplitude hyperparameter (vertical lengthscale) and ℓ is the (horizontal) lengthscale hyperparameter. For such hyperparameters, we will adopt the notation of θ to represent the set of kernel hyperparameters. It has been argued by Stein (2012) that the smoothness assumptions of the SE kernel are unrealistic for many physical processes. As such, kernels such as the Matern,

$\begin{eqnarray}&&\begin{array}{l}{k}_{\mathrm{Matern}}(t,t^{\prime} )=\displaystyle \frac{{2}^{1-\nu }}{{\rm{\Gamma }}(\nu )}{\left(\displaystyle \frac{\sqrt{2\nu }-| t-t^{\prime} | }{{\ell }}\right)}^{\nu }{K}_{\nu }\\ \quad \times \left(\displaystyle \frac{\sqrt{2\nu }-| t-t^{\prime} | }{{\ell }}\right),\end{array}\end{eqnarray} \tag{ 6 }$

are more commonly seen in machine learning literature. Here K_ν is a modified Bessel function of the second kind, Γ is the gamma function, and ν is a nonnegative parameter of the kernel, which is typically taken to be either $\tfrac{3}{2}$ or $\tfrac{5}{2}$ (Rasmussen & Williams 2006). The lengthscale hyperparameter ℓ can be thought of loosely as a decay coefficient for the covariance between inputs as they become increasingly far apart in the input space; the farther apart the inputs are, the less correlated they will be. The final kernel used in this work is the rational quadratic (RQ) kernel,

$\begin{eqnarray}&&{k}_{{RQ}}(t,t^{\prime} )={\left(1+\displaystyle \frac{| t-t^{\prime} {| }^{2}}{2\alpha {{\ell }}^{2}}\right)}^{-\alpha }.\end{eqnarray} \tag{ 7 }$

where α, ℓ > 0. The RQ kernel can be viewed as a scale mixture of SE kernels with different characteristic lengthscales. All kernels used in this work are stationary kernels, and as such, it should be stated that this reflects a modeling assumption that the underlying time series is stationary. The extension of the current work to include nonstationary kernels will be discussed in Section 6.

2.2. Prediction with GPs

To illustrate the homoscedastic (constant noise) gp predictive model we use X-ray timing as an example. We wish to model the count rate f(t). We place a GP prior over f,

$\begin{eqnarray}&&p({\boldsymbol{f}}({\boldsymbol{t}})| \theta )={ \mathcal G }{ \mathcal P }\left(0,K({\boldsymbol{t}},{\boldsymbol{t}})\right),\end{eqnarray} \tag{ 8 }$

where f denotes the vector of function values evaluated at the set of times ${\{{t}_{i}\}}_{i=1}^{N},N\in {\mathbb{N}}$ . K( t , t ) is a kernel matrix where entries are computed by the kernel function as ${\left[K\right]}_{{ij}}=k({t}_{i},{t}_{j})$ . θ represents the set of kernel hyperparameters. The GP prior is written as

$\begin{eqnarray}&&p({\boldsymbol{y}}({\boldsymbol{t}})| \theta )={ \mathcal G }{ \mathcal P }\left(0,K({\boldsymbol{t}},{\boldsymbol{t}})+I{\sigma }_{y}^{2}\right),\end{eqnarray} \tag{ 9 }$

where $I{\sigma }_{y}^{2}$ represents the variance of the Gaussian noise on the observations y . The applicability of such a noise model, also known as a Gaussian likelihood, will be discussed further in Section 3.2. Once we have observed some data y , the joint distribution over the observed data y and the predicted function values f * at test locations t * may be written as

$\begin{eqnarray}&&\left[\begin{array}{c}{\boldsymbol{y}}\\ {{\boldsymbol{f}}}_{* }\\ \end{array}\right]\sim { \mathcal N }\left(0,\left[\begin{array}{cc}K({\boldsymbol{t}},{\boldsymbol{t}})+{\sigma }_{y}^{2}I & K({\boldsymbol{t}},{{\boldsymbol{t}}}_{* })\ \\ K({{\boldsymbol{t}}}_{* },{\boldsymbol{t}})+\ \sigma {y}^{2} & K({{\boldsymbol{t}}}_{* },{{\boldsymbol{t}}}_{* })\end{array}\right]\right),\end{eqnarray} \tag{ 10 }$

where ${ \mathcal N }$ is the multivariate Gaussian probability density function. The joint prior may be conditioned on the observations through

$\begin{eqnarray}&&p({{\boldsymbol{f}}}_{* }| {\boldsymbol{y}})=\displaystyle \frac{p({{\boldsymbol{f}}}_{* },{\boldsymbol{y}})}{p({\boldsymbol{y}})},\end{eqnarray} \tag{ 11 }$

which enforces that the joint prior agrees with the observed target values y . The predictive distribution is thus given as

$\begin{eqnarray}&&p({{\boldsymbol{f}}}_{* }| {\boldsymbol{t}},{\boldsymbol{y}},{{\boldsymbol{t}}}_{* })={ \mathcal N }\left({\bar{{\boldsymbol{f}}}}_{* },\mathrm{cov}({{\boldsymbol{f}}}_{* })\right),\end{eqnarray} \tag{ 12 }$

with the predictive mean at test locations t _* being

$\begin{eqnarray}&&\bar{{{\boldsymbol{f}}}_{* }}=K({{\boldsymbol{t}}}_{* },{\boldsymbol{t}}){\left[K({\boldsymbol{t}},{\boldsymbol{t}})+{\sigma }_{y}^{2}I\right]}^{-1}{\boldsymbol{y}}\end{eqnarray} \tag{ 13 }$

and the predictive uncertainty being

$\begin{eqnarray}&&\begin{array}{l}\mathrm{cov}({{\boldsymbol{f}}}_{* })=K({{\boldsymbol{t}}}_{* },{{\boldsymbol{t}}}_{* })-K({{\boldsymbol{t}}}_{* },{\boldsymbol{t}})\\ \quad \times {\left[K({\boldsymbol{t}},{\boldsymbol{t}})+{\sigma }_{y}^{2}I\right]}^{-1}K({\boldsymbol{t}},{{\boldsymbol{t}}}_{* }).\end{array}\end{eqnarray} \tag{ 14 }$

Analyzing the form of this expression one may notice that the first term K( t _*, t _*) in the expression for the predictive uncertainty cov( f *) may be viewed as the prior uncertainty and the second term $K({{\boldsymbol{t}}}_{* },{\boldsymbol{t}}){\left[K({\boldsymbol{t}},{\boldsymbol{t}})+{\sigma }_{y}^{2}I\right]}^{-1}K({\boldsymbol{t}},{{\boldsymbol{t}}}_{* })$ can be thought of as a subtractive factor that accounts for the reduction in uncertainty when observing the data points y .

2.3. Bayesian Model Selection

One desirable property of GPs and Bayesian models in general is the ability to carry out hierarchical modeling (MacKay 1992; van der Wilk 2019). The three tiers of the modeling hierarchy are:

1.
Model Parameters,
2.
Model Hyperparameters,
3.
Model Structures.

In the case of the nonparametric GP framework, parameters do not have the same meaning as in parametric Bayesian models and are instead obtained from the posterior distribution over functions. Hyperparameters are typically parameters of the kernel function such as signal amplitudes and lengthscales. An important entity for hyperparameter optimization in GPs is the log marginal likelihood or evidence (MacKay 1991):

$\begin{eqnarray}&&\begin{array}{l}\mathrm{log}p({\boldsymbol{y}}| {\boldsymbol{t}},\theta )=\mathop{\underbrace{-\displaystyle \frac{1}{2}{{\boldsymbol{y}}}^{\top }{\left({K}_{\theta }({\boldsymbol{t}},{\boldsymbol{t}})+{\sigma }_{y}^{2}I\right)}^{-1}{\boldsymbol{y}}}}\limits_{\mathrm{encourages}\,\mathrm{fit}\,\mathrm{with}\,\mathrm{data}}\\ \quad \mathop{\underbrace{-\displaystyle \frac{1}{2}\mathrm{log}| {K}_{\theta }({\boldsymbol{t}},{\boldsymbol{t}})+{\sigma }_{y}^{2}I| }}\limits_{\mathrm{control}\,\mathrm{smodel}\,\mathrm{capacity}}-\displaystyle \frac{N}{2}\mathrm{log}(2\pi ),\end{array}\end{eqnarray} \tag{ 15 }$

where N is the number of observations and the subscript notation on the kernel matrix K_θ( t , t ) is chosen to indicate the dependence on the set of hyperparameters θ. The two terms in the expression for the marginal likelihood embody Occam's Razor (Rasmussen & Ghahramani 2001) in their preference for selecting models of intermediate capacity. The first term in Equation (15) acts as a term that penalizes functions that do not fit the data well whereas the second term acts as a regularizer, disfavoring overly complex models. In this work, kernel hyperparameters are chosen to optimize the marginal likelihood. At the level of model structures, the fit achieved by different kernels can be quantitatively assessed by comparing the values of the optimized log marginal likelihood objective.

3. GP Modeling of Mrk 335

In this paper we consider the Swift X-ray and UVW2 lightcurves in time bins of one day. We refer the reader to Gallo et al. (2018) for details of the data reduction processes. The observational measurements used in this work run from 54327 to 58626 modified Julian days and comprise 509 data points for the X-ray band and 498 data points for the UVW2 band. We consider the latest UVOT sensitivity calibration file ("swusenscorr20041120v006.fits") to account for the sensitivity loss with time in the UVW2 band.¹⁷

3.1. Identifying the Flux Distribution

In order to assess the applicability of GPs in modeling the flux distribution of the X-ray and UVW2 bands of Mrk 335, we perform a series of graphical distribution tests to determine the sample distribution. The histograms of the log count rates for the X-ray, and flux for the UV bands, of Mrk 335 are shown in Figure 1. The histograms show that the distribution of the UVW2 flux is approximately Gaussian-distributed whereas the X-ray count rate distribution appears to be log-Gaussian-distributed, in line with the general observation of Uttley & McHardy (2005) that fluxes from accreting black holes tend to follow log-Gaussian distributions. We provide further graphical distribution tests based on probability–probability (PP) plots and empirical cumulative distribution functions (ECDFs) in Appendix A.

**Figure 1.** Histograms of the observed Swift X-ray log count rate and UVW2 flux overlaid with Gaussian kernel density estimates. The raw UVW2 flux values have been scaled by 1e¹⁴.
Download figure:
Standard image High-resolution image

Furthermore, following Wilkins (2019) we perform a Kolmogorov–Smirnov test for goodness of fit where the null hypothesis is that the sample was drawn from a Gaussian distribution. For the UVW2 flux values we obtain a p-value of 0.164. We obtain a p-value of 1.017e⁻²⁰ for the raw X-ray count rates and a p-value of 0.028 for the log-transformed X-ray count rates. As such, we cannot reject the null hypothesis that either UVW2 flux or log-transformed X-ray count rates are drawn from a Gaussian distribution at the 1% level of significance. We may however reject the null hypothesis in the case of the raw X-ray count rates, providing evidence that the raw X-ray count rates should be log-transformed in order to be well modeled by a Gaussian distribution. As such, we log transform the raw X-ray count rates and leave the UVW2 flux values unchanged.

3.2. Noise

As noted by Wilkins (2019) fitting a GP to the logarithm of the count rate is appropriate only in the limit of a large signal-to-noise ratio. In the case of Mrk 335, the Poisson (shot) noise intrinsic to the photon detectors used to obtain the flux measurements is over an order of magnitude smaller than the flux measurement itself. As such the choice of the log-GP would appear to be justified.

3.3. Simulations

We undertake a simulation study in order to quantitatively assess the abilities of different kernels to interpolate gapped simulated lightcurves. Observational power spectral densities (PSDs) of AGNs are well described by (broken) power laws (Mchardy et al. 2004). As such, our simulations employ a power-law PSD with an index fit to the observational data. Our goals with the study are twofold: first, although we cannot be sure of the true PSD for the observational data, we hope that the simulations may afford a good proxy for identifying performant kernels based on the fact that AGNs typically exhibit power-law-like PSDs, and second, we wish to test whether a kernel's ability to reconstruct the full simulated lightcurve correlates with its marginal likelihood value for the gapped data on which it is trained. If there is a correlation, we may use the marginal likelihood as a metric for identifying the appropriate kernel on the observational data.

One thousand simulated lightcurves with gaps are generated for the Mrk 335 X-ray and UV bands using the method of Davies & Harte (1987), first applied in astrophysics by Timmer & König (1995). For each lightcurve we have access to the ground-truth functional form of the lightcurve before the introduction of gaps. Computationally, the ground-truth lightcurve is evaluated on a fine, discrete grid of 4390 time points whereas the gapped lightcurves are evaluated on a coarser, unevenly spaced grid of 498 time points for the UV simulations and 509 time points for the X-ray simulations in line with the number of observational data points. We then quantify how well each gp kernel performs in recovering the ground-truth lightcurve by measuring the normalized residual sum of squared errors,

$\begin{eqnarray}&&\mathrm{RSS}=\displaystyle \frac{1}{N}\sum _{i=1}^{N}{\left(f({t}_{i})-{y}_{i}\right)}^{2},\end{eqnarray} \tag{ 16 }$

where f(t_i) is the GP prediction at grid point t_i and y_i is the true simulated count rate value. The RSS values are averaged over the 1000 simulated lightcurves. We provide an illustration of the RSS metric in Figure 2. In addition, we compute the averaged negative log marginal likelihood (NLML) for each kernel:

$\begin{eqnarray}&&\begin{array}{l}\mathrm{NLML}=\displaystyle \frac{1}{2}{{\boldsymbol{y}}}^{\top }\left({K}_{\theta }({\boldsymbol{t}},{\boldsymbol{t}})\right.\\ \quad {\left.+{\sigma }_{y}^{2}I\right)}^{-1}+{\boldsymbol{y}}\displaystyle \frac{1}{2}\mathrm{log}| {K}_{\theta }({\boldsymbol{t}},{\boldsymbol{t}})\\ \qquad +{\sigma }_{y}^{2}I| +\displaystyle \frac{N}{2}\mathrm{log}(2\pi ).\end{array}\end{eqnarray} \tag{ 17 }$

**Figure 2.** Residual plot. The normalized RSS metric is the sum of squared residuals divided by the total number of discretized points (4390) comprising the simulated lightcurve. A residual in this case represents the difference between the Gaussian process predictive mean and the ground-truth value of the simulated lightcurve.
Download figure:
Standard image High-resolution image

The NLML in this case is the negative of the quantity given in Equation (15). Kernel hyperparameters were selected via optimization of the NLML using the scipy optimizer of GPflow (Matthews et al. 2017). The jitter level was fixed at 0.001, a small positive number to ensure numerical stability. The output values (flux or the logarithm of the count rate) were standardized according to their empirical mean and standard deviation. We use a constant mean function set to the empirical mean of the data following standardization as discussed in Section 2.

We report the results of this simulation study in Table 1. The NLML values show correlation with RSS, thus providing evidence that NLML is an appropriate metric for determining the GP kernel for the real observational data (for which the ground-truth lightcurve is of course not available). A paired t-test was conducted to determine whether the RSS results were significant in terms of identifying the best kernel. For the X-ray simulations, a t-statistic of 9 was obtained corresponding to a two-sided p-value of 5⁻²⁰. For the UVW2 simulations, a t-statistic of −22 was obtained corresponding to a two-sided p-value of 9⁻⁸⁵. As such, the null hypothesis that the performance discrepancy between kernels on the RSS metric is due to chance variation across 1000 simulations, may be rejected at the 1% level of significance. We offer further rationalization in Appendix B for why the top two performing kernels in the simulation study are the Matern $\tfrac{1}{2}$ and RQ kernels.

Table 1. Performance Comparison of Kernels Based on the NLML on the Simulated Gapped X-Ray and UV Lightcurves and Normalized Residual Sum of Squared Errors (RSS) on the Ground-truth Simulated Lightcurves

Kernel	NLML	RSS
X-Ray
Matern $\tfrac{1}{2}$	180.2 ± 3.8	0.121 ± 0.002
Matern $\tfrac{3}{2}$	420.7 ± 3.3	0.309 ± 0.003
Matern $\tfrac{5}{2}$	523.5 ± 2.9	0.374 ± 0.003
Rational Quadratic	184.2 ± 3.6	0.117 ± 0.002
Squared Exponential	632.1 ± 1.5	0.554 ± 0.004

UVW2
Matern $\tfrac{1}{2}$	−399.0 ± 5.2	2.9 ± 0.08
Matern $\tfrac{3}{2}$	−298.3 ± 6.0	7.9 ± 0.25
Matern $\tfrac{5}{2}$	−219.6 ± 6.5	17.0 ± 0.41
Rational Quadratic	−349.2 ± 5.4	3.4 ± 0.09
Squared Exponential	−65.0 ± 7.4	32.8 ± 0.55

Note. The mean NLML and RSS across 1000 simulations are reported with the standard error. UVW2 RSS values have an exponent of −30. Bold values indicate best performance.

Download table as: ASCII Typeset image

3.4. GP Fits

The fits to the observational data for the UVW2 and X-ray bands are shown in Figures 3 and 4, respectively. In an analogous fashion to the simulation experiments we evaluate five stationary kernels: Matern $\tfrac{1}{2}$ , Matern $\tfrac{3}{2}$ , Matern $\tfrac{5}{2}$ , RQ, and SE. We choose to display the two kernels, RQ and Matern $\tfrac{1}{2}$ , which performed best in the simulation study in their abilities to model power-law-like PSDs. These kernels also have the most favorable values under the NLML metric for the observational data. We again use a constant mean function set to the empirical mean of the data following standardization. We optimize all kernel hyperparameters under the marginal likelihood save for the noise level which we fix to a constant value in the standardized space. This constant noise value is computed by dividing the mean output value in the standardized space by the mean signal-to-noise ratio in the original space.

**Figure 3.** gp lightcurves for the UVW2 band. The shaded regions denote the gp 95% confidence interval. We show both the gp mean and a sample from the gp posterior in separate plots. The insets are included to highlight the variability of the fit.
Download figure:
Standard image High-resolution image

**Figure 4.** gp lightcurves for the X-ray band. The shaded regions denote the gp 95% confidence interval. We show both the gp mean and a sample from the gp posterior in separate plots. The insets are included to highlight the variability of the fit.
Download figure:
Standard image High-resolution image

4. Structure Function Analysis

Ideally we would like to examine the PSD of the GP fits to the observational data. The PSD characterizes the distribution of power over frequencies of a given emission band and properties of the PSD can be linked to underlying physical processes in the accretion disk. Computation of the PSD, while possible, can be complicated by the uneven sampling of the observational data, leading previous studies to instead perform a structure function analysis on the Mrk 335 data (Gallo et al. 2018). While it is possible to extract the PSD from the learned kernel (Wilkins 2019), in this work we choose to perform a structure function analysis of the gp lightcurves in order to compare directly against the results of Gallo et al. (2018). We follow the method described in Simonetti et al. (1985), Hughes et al. (1992), di Clemente et al. (1996), Collier & Peterson (2001), and Gallo et al. (2018). The binned structure function is defined as

$\begin{eqnarray}&&\mathrm{SF}(\tau )=\displaystyle \frac{1}{N(\tau )}\sum _{i}\ {\left[f({t}_{i})-f({t}_{i}+\tau )\right]}^{2},\end{eqnarray} \tag{ 18 }$

where τ = t_j − t_i is the distance between pairs of points i and j such that t_j > t_i. The structure function is binned according to τ where the centers of each bin are given by ${\tau }_{i}=(i-\tfrac{1}{2})\delta$ . δ in this instance is the structure function resolution. We use the same δ as in Gallo et al. (2018), namely 5.3 days for the structure function computation over both the X-ray and UVW2 bands. f(t_i) gives the count rate value at time point t_i and N(τ) is the number of structure function pairs in each bin i with center τ_i. Accounting for measurement noise by subtracting twice the mean noise variance from each structure function bin, as performed in Gallo et al. (2018), was found to have a negligible effect on the gp structure functions and so we ignore it. As in Gallo et al. (2018), we normalize the structure function values by the global lightcurve variance.

The gp structure functions for the interpolated lightcurves are shown in Figure 5. The 1σ gp error bars are obtained by computing the structure function over 50 samples from the GP posterior. Each sample gives rise to highly similar structure functions and so the errors are not visible on the plot. The structure functions computed from the observational data, 509 and 498 data points for the X-ray and UV bands of Mrk 335, respectively, are included for reference. In contrast to the gp structure function error bars, in the case of the observational data the error bars are computed as ${\sigma }_{i}/\sqrt{(}\tfrac{{N}_{i}}{2})$ , where σ_i is the noise standard deviation in bin i and N_i is the number of pairs in bin i.

The GP structure functions are compared against the observational structure functions in Figure 5. In addition, we plot broken power-law fits to the gp structure functions, the parameters of which are given in Table 2. In the UVW2 band, both GP kernels yield structure functions possessing a consistent break point at ca. 125 days. In the X-ray band the Matern $\tfrac{1}{2}$ kernel yields a break point at 66 days whereas the RQ kernel fit yields an unbroken power law. Given the discrepancy between gp kernels, we do not find definite evidence for a break in the X-ray power law.

Table 2. Parameters for the Broken Power-law Fits to the gp Structure Function

Wave Band	Kernel	α₁	α₂	τ_char
UVW2	Matern $\tfrac{1}{2}$	−0.72 ± 0.03	−0.26 ± 0.01	127 ± 8
UVW2	Rational Quadratic	−0.62 ± 0.01	−0.28 ± 0.01	125 ± 5
X-ray	Matern $\tfrac{1}{2}$	−0.29 ± 0.03	−0.07 ± 0.004	66 ± 8
X-ray	Rational Quadratic	−0.21 ± 0.002	N/A	N/A

Note. α₁ and α₂ are the indices for the power law before and after the break point τ_char. The break point τ_char is reported in days. Errors are computed using 200 bootstrap samples of the data points corresponding to the GP structure functions. The X-ray rational quadratic structure function is fit using a power law and as such only has a single index as a parameter. The astropy library (Astropy Collaboration et al. 2013, 2018) is used to compute the (broken) power-law fits using the simplex algorithm and least- squares statistic for optimization with the gp structure function uncertainties used as weights in the fitting.

Download table as: ASCII Typeset image

Of particular interest is whether the dip in the X-ray structure function is an expected feature of the latent lightcurve or a measurement artifact arising from uneven sampling. In order to assess the potential for the dip to be a sampling artifact, we perform simulations using the Timmer and Konig algorithm from Section 3.3. In this case we compute structure functions of gapped lightcurves and compare against structure functions derived from the ground-truth lightcurves with no gaps. One representative simulation is depicted in Figure 6. In this instance a similar dip to that found in the observational data is observed in the X-ray band simulation. This highlights the possibility that the dip seen in the observational X-ray structure function is a sampling artifact arising from gaps in the lightcurve.

**Figure 6.** Structure function simulations. Pseudo-observational lightcurves are produced by introducing gaps into the simulated lightcurves. The structure function for the gapped lightcurve is shown in red in (a) whereas the structure function of the gp interpolation is shown in red in (b). Both structure functions are compared against the ground-truth structure function obtained from the full simulated ground-truth lightcurve (no gaps). The dips at τ = 200 days and τ = 400 days in the structure function derived from the gapped observational simulation in 6(a) are artifacts of the uneven sampling.
Download figure:
Standard image High-resolution image

5. Lag and Coherence

In this section, we calculate the coherence between the UVW2 and X-ray emission from Mrk 335 in search of evidence of lag features in the Fourier frequency domain. The coherence and lag spectra are estimated from 1000 pairs of UVW2 and X-ray gp lightcurve samples drawn from the GP posterior for each kernel. The lags are defined as the phase lags divided by the corresponding Fourier frequency. A similar approach has been used in other disciplines (e.g., Fabian et al. 2009; Kara et al. 2013). We consider both Matern $\tfrac{1}{2}$ and RQ kernels. The results are shown in Figure 7. Positive lags imply that the X-ray variability leads the UVW2 variability. The error bars in the figure are the standard errors of the corresponding measurements for the 1000 samples.

**Figure 7.** The coherence and lag spectra for Mrk 335. They are calculated by using 1000 pairs of gp lightcurve samples fit to the observed lightcurves. The error bars are the standard errors of the corresponding measurements for the 1000 samples. Different panels are for different kernels. Positive lags imply that the X-ray band leads the UVW2 band.
Download figure:
Standard image High-resolution image

The coherence between the UVW2 and X-ray emission decreases with frequency, suggesting more coherent variability at lower frequency. Positive lag features are shown at the low frequencies in the range f = 0.005–0.025 day⁻¹. The absolute value of the lag at f = 0.0039 ± 0.0014 days⁻¹ is estimated to be 19 ± 22 days for the Matern $\tfrac{1}{2}$ kernel applied to both lightcurves and 29 ± 19 days for the RQ kernel; however, both measurements are consistent with zero lag in the 2σ uncertainty range.

Tentative evidence of a shorter time lag at a higher frequency of f = 0.018 ± 0.006 day⁻¹ is also found. The longer lag feature at a lower frequency would correspond to a more extended emission region while the shorter lag feature at a higher frequency would correspond to a more compact region. This could be explained by the presence of an extended UV emission region on the disk where reverberation happens.

Given that the lags are consistent with zero lag within 2σ uncertainty ranges, we conclude that only tentative evidence for a broad lag feature is found by applying GPs to the UVW2 and X-ray lightcurves of Mrk 335. Previous attempts to identify lags between two wavelengths of Mrk 335 based on cross-correlation analysis in the time domain suggest similar results (e.g., Gallo et al. 2018).

6. Conclusions

Following the interpolation of the unevenly sampled lightcurves of Mrk 335 using GPs, tentative evidence for broad lag features is found in the Fourier frequency domain. The magnitude of the lags is consistent with previous cross-correlation analyses. In addition, the broad lag features might suggest an extended emission region, e.g., of the disk in Mrk 335 during the reverberation processes. If the corona is compact within 5 R_g in Mrk 335 (Wilkins et al. 2015), our data suggest a possibly wide range of UVW2 emission radii.

The structure functions computed from the gp-interpolated lightcurves are consistent with those derived from the observational data and furthermore, illicit potential insights into the properties of the latent lightcurves. In particular, we show through a simulation study that it is possible that dips in the X-ray structure function may be produced by sampling artifacts arising from gaps in the lightcurve. In contrast, the gp structure functions show no dips. While this is not proof that the dip in the observational X-ray structure function is due to a sampling artifact, it does allude to the possibility. The UVW2 gp structure functions do not exhibit strong dependence on the choice of kernel with both Matern $\tfrac{1}{2}$ and RQ showing up as a broken power law with breaks at 139 and 155 days, respectively. The X-ray structure functions however do show up differences between kernels with the RQ kernel predicting a power law and the Matern 1/2 kernel predicting a broken power law.

From the GP modeling perspective, the ability to carry out Bayesian model selection affords a quantitative means of comparing analytic kernels under the marginal likelihood. It may be possible to incorporate further flexibility into the fitting procedure by making use of more sophisticated methods of kernel design (Duvenaud 2014) in order to allow the assessment of fits of sums and products of analytic kernels or by leveraging advances in transforming GP priors via deep GPs (Damianou & Lawrence 2013) or normalizing flows (Maroñas et al. 2020). Such approaches could be validated using simulation studies. Additionally, modeling the cross-correlation using multi-output GPs (de Wolff et al. 2021) may be an interesting avenue for comparison against the approach taken here. Lastly, Bayesian spectral density estimation (Tobar 2018) may afford further flexibility through nonparametric modeling of the power spectral density in addition to nonparametric modeling of the lightcurve in the time domain. These improvements in Bayesian modeling machinery may help to minimize model misspecification and as such, enable more robust inferences to be made about the functional forms of the latent lightcurves.

J.J. acknowledges support from the Tsinghua Shui'Mu Scholar Program and the Tsinghua Astrophysics Outstanding Fellowship.

Appendix A: Additional Graphical Tests for Identifying the Flux Distribution

In Figure 8 we show P-P plots and ECDFs as graphical distribution tests for Gaussianity. It may be observed qualitatively that both X-ray band log count rates and UVW2 flux are well modeled by a Gaussian distribution.

Appendix B: Kernels

In Figure 9, we show plots of analytic kernel autocorrelation functions as well as their associated PSDs.

**Figure 9.** Kernel autocorrelation functions and PSDs. The rational quadratic kernel is plotted for different values of the α parameter. The Matern kernel plots in the PSD figure are offset by a factor of 10 for clarity. A PSD of f⁻² will match the high-frequency part of the Matern $\tfrac{1}{2}$ kernel and the rational quadratic is endowed with additional flexibility to model PSDs by virtue of its α parameter. Such characteristics may explain why these kernels are preferred in the simulation study.
Download figure:
Standard image High-resolution image

Modeling the Multiwavelength Variability of Mrk 335 Using Gaussian Processes

Article metrics

Permissions

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction