Classifying High-cadence Microlensing Light Curves. I. Defining Features

Somayeh Khakpash; Joshua Pepper; Matthew Penny; B. Scott Gaudi; R. A. Street

doi:10.3847/1538-3881/abd6cc

1. Introduction

Microlensing is a phenomenon that happens when light emitted from a distant object (the source) is lensed by a closer, massive object (the lens), and as a result, multiple images of the source are formed. These images are typically not resolved because their angular separation is much smaller than the angular resolution of both ground- and space-based telescopes, and consequently, we observe a brightening of the source.

Microlensing is a powerful tool for detecting small and dim objects that are otherwise very hard to detect by their emitted light, and in particular, it is so far the only method capable of investigating planetary systems with terrestrial-mass planets orbiting beyond the snow line (Gaudi 2010). For several decades, ground-based surveys have searched the Galactic bulge for microlensing events. The Optical Gravitational Lensing Experiment (OGLE; Udalski et al. 2015), the Microlensing Observations in Astrophysics (MOA; Bond et al. 2001), and the Korea Microlensing Telescope Network (KMTNet; Kim et al. 2016) surveys detect thousands of microlensing events. These observations have led to the successful discovery of over 100 exoplanets⁵ and some compact substellar objects (Griest et al. 1995; Bennett et al. 2002; Mao et al. 2002).

Identifying microlensing events from the huge numbers of light curves obtained in these surveys is a challenge. It is useful to be able to distinguish microlensing events either in real time during survey operations as the events start or after an observing campaign has finished and the data are being fully analyzed. Microlensing surveys have generally relied on circulating real-time alerts of potential microlensing events when there is an increase in the brightness of an observed source. This method can lead to alerts for any sudden rise in a light curve, which can include variability other than microlensing, such as cataclysmic variables (CVs). There has been recent work to increase the accuracy of these alerts, limiting them to genuine microlensing events, using machine learning (ML) algorithms that can distinguish microlensing light curves from other variability in real time (Godines et al. 2019; Kessler et al. 2019). Real-time classification can increase the accuracy of alerts. It can also identify high-value events for which only partial coverage will be obtained by the survey, and it allows observers to schedule supplemental follow-up observations to increase the coverage of the event.

There is also a need for classification of microlensing survey data after the conclusion of observations. At that point, it is necessary to first distinguish microlensing events from other types of variability and also to separate different types of microlensing events, such as those that include planetary signals. Surveys like the KMTNet survey have employed an automated approach to detect microlensing-like variability in complete light curves by fitting simple functions (Kim et al. 2018). Belokurov et al. (2003) also advocate the use of neural networks to distinguish well-sampled microlensing light curves from other types of variability. In recent years, more ML approaches are applied to microlensing characterization and classification challenges. For instance, Mróz (2020) trained two neural networks classifiers on OGLE light curves to classify single and binary microlensing events. Zhang et al. (2020) trained an automated inference method based on neural density estimation (NDE) on simulated Roman light curves to obtain precise light-curve model posteriors faster than traditional sampling methods. In this paper, we wish to explore the efficiency and efficacy of after-the-fact classification of microlensing signals.

ML has now become a popular method for classifying astronomical time series. Some ML classifiers like the random forest (Liaw & Wiener 2002) and k-mean (Lloyd 1982) classifiers take light curve's features as input and try to find a connection between those features and the classification labels. In this scenario, "features" are quantitative statistical or morphological measurements of the time series. Some other methods like neural networks use the light curves themselves as input and then find common patterns or higher-order correlated properties to classify them. As a specific example, random forest is a widely used algorithm that classifies time series by making decisions based on features in the time series (Pawlak 2019; Bluck et al. 2020). In order to use this algorithm efficiently, observable features in the light curves that are most closely related to the canonical model parameters must first be identified.

Here we focus on analyzing complete high-cadence microlensing light curves. These are simulated light curves for the Roman Galactic Bulge Exoplanet Survey. The Nancy Grace Roman Space Telescope (Roman), formally known as the Wide Field Infrared Survey Telescope (WFIRST), is a NASA future space mission that is expected to be launched by the mid-2020s. One of its primary goals is to detect exoplanets using the microlensing method (Spergel et al. 2015). It is estimated that Roman will find about 54,000 microlensing events and will detect 1400 planets (Penny et al. 2019). The 0.01 au distance of Roman from Earth enables measuring parallaxes for free-floating planets (FFPs) and Earth-mass bound planets to help constrain their masses (Zhu & Gould 2016; Street et al. 2018). To assess the value of ML techniques to these surveys, we will apply ML features and classifiers to simulated Roman microlensing data produced for the Roman microlensing data challenge⁶ (R. Street et al. 2021, in preparation).

Binary-lens light curves are often difficult to model because of the complicated features in their light curves and the large parameter space that needs to be fully searched (for a more thorough review refer to the introduction of Penny 2014 and Khakpash et al. 2019). Those challenges highlight the importance of developing fast automated algorithms to quickly analyze light curves and estimate the microlensing model parameters. Our primary goal in this work is to identify a list of features specifically defined for microlensing light curves that can be generated by fast and efficient algorithms and to show that these features can help either in differentiating microlensing light curves among other types of variability or in classifying microlensing light curves into different types. This would enable fast detection of planetary system lenses and other interesting lensing cases in the released data sets of large surveys like Roman.

We present a collection of algorithms including various functional fits that are applied to the light curves, and from these fits, we extract parameters that quantify features of the light curve like smoothness of the peak, symmetry, number of peaks, similarity to microlensing single-lens curves, number of deviations from the peak, and width and height of the deviations from the main peak. We then show how effective each of these functions is in distinguishing between different types of microlensing events like identifying single-lens versus multiple-lens systems, or planetary system lenses versus stellar binary-lens systems, and in some cases in detecting microlensing events among other types of variability. A similar work was first done by Mao & Di Stefano (1995), where they used features such as an estimate of the asymmetry about the peak to detect binary-lens signatures in the light curves. In Section 2, we introduce the different models of microlensing events that need to be parameterized for the classification, and in Section 3, we discuss the properties of our test data set. In Section 4, we introduce our algorithm package, including the different functional fits and their respective output parameters. We also evaluate the effectiveness of each function in capturing the specific set of features in the light curve. In Section 6, we show preliminary tests of using these features as input to ML classifiers. Finally, in Section 7, we discuss our results and applicability of our method to other data sets.

2. Microlensing Models

Microlensing occurs when the light coming from a distant source is lensed by a closer object along the line of sight. As a result, the source appears brighter as the angular separation of the two objects decreases. This phenomenon then results in a peak in the light curve of the source star at the time of closest angular approach. A simple single-lens microlensing light curve has a single symmetric peak as shown in panel (a) of Figure 1. The different panels in Figure 1 represent different shapes of microlensing events that might be present in a large data set of a Galactic bulge survey. Note that it is practically impossible to include all possible types of light-curve morphologies, and this plot is only a small subset of all possibilities.

The light curves in Figure 1 have characteristics representative of the types of features we are seeking to automatically detect in this work. When there are multiple lenses or sources, and when there are second-order effects like the finite-source effect and parallax, the shape of the single-lens peak will deviate from a symmetric peak as in panel (a). In this section, we will discuss the different physical phenomena that can cause these deviations.

Panels (b) and (c) in Figure 1 show two single-lens models affected by the finite-source effect (Nemiroff & Wickramasinghe 1994; Witt & Mao 1994; Gould & Gaucherel 1996). Panel (c) contains an event caused by an FFP, and therefore the light curve is more affected by the finite-source effect. Panels (d) and (e) include example light curves of binary-lens events caused by binary stars, and panel (f) has an example of a planetary binary-lens microlensing event with a caustic crossing. Panels (g) and (h) have examples of planetary binary-lens events with no caustic crossings. The example in panel (g) is an example of a major image perturbation, and the event in panel (h) is a minor image perturbation (for a discussion of major/minor images refer to Section 2.4.2). In the following subsections, we introduce different physical models that give rise to the light-curve features seen in Figure 1.

2.1. Point-source Point-lens Microlensing Light Curves

The point-source point-lens (PSPL) model is defined for a lensing event when there is a single lens and a single source, and the source and the lens can both be approximated as point-like objects with zero angular size. When there is no blending of the source with the neighboring stars or light from the lens or companions to the lens or source, this model can be described by a simple Paczyński curve as in Equations (1) and (2). A is the magnification of the source, t₀ is the time of the maximum magnification, u₀ is the impact parameter, and t_E is the Einstein crossing time. Figure 2 shows five different PSPL models with the same Einstein timescale and different values of u₀. In the presence of blending, another parameter is added to Equation (1) and will yield Equation (3), where F(t) is the differential flux and f_s is the blending parameter and determines what fraction of the total flux in the aperture is due only to the source. When there are no higher-order effects, these curves remain symmetric, and the sharpness of the peak in high-magnification events remains intact. Detecting any deviation from this model can help identify the presence of higher-order effects

$\begin{eqnarray}&&A(t)=\displaystyle \frac{u{\left(t\right)}^{2}+2}{u(t)\sqrt{{u}^{2}+4}}\end{eqnarray} \tag{ 1 }$

$\begin{eqnarray}&&u(t)=\sqrt{{{u}_{0}}^{2}+{\left(\displaystyle \frac{t-{t}_{0}}{{t}_{{\rm{E}}}}\right)}^{2}}\end{eqnarray} \tag{ 2 }$

$\begin{eqnarray}&&F(t)={f}_{s}\times A(t)+(1-{f}_{s}).\end{eqnarray} \tag{ 3 }$

**Figure 2.** Five PSPL models with different values of u₀. The figure shows how the shape of the curve changes for different values of the impact parameter.
Download figure:
Standard image High-resolution image

2.2. Finite-source Effects

When the source is not point-like and has a finite size, the magnification function will be different, especially when the impact parameter is comparable to the source size (Witt & Mao 1994). In Figure 1, an example of the changes due to finite-source effects can be seen in panel (b). For small impact parameters, this effect becomes stronger, and it smooths out the peak of the event. This effect can be used to estimate θ_E, the angular size of the Einstein ring, and therefore provide constraints on the lens mass (Jiang et al. 2004; Yoo et al. 2004).

2.3. Free-floating Planets

Isolated, short-duration microlensing events can be due to FFPs or planets in very wide orbits. Compared to most other exoplanet detection methods, microlensing has the capability to detect these planets since the method only depends on the mass of the lens and not its luminosity. Microlensing events caused by FFPs have short timescales of t_E < 2 days because the size of the Einstein ring depends on the lens mass, and they can be detected in high-cadence observations (Sumi et al. 2011; Mróz et al. 2017, 2018, 2020). Roman's unique combination of high cadence and small photometric noise makes it particularly favorable for detecting the FFPs (Johnson et al. 2020).

Microlensing due to FFPs is very likely to be affected by finite-source effects. The finite-source effect (Witt & Mao 1994) parameter, ρ, is defined as θ_*/θ_E, where θ_* is the apparent size of the source and θ_E is the angular size of the Einstein ring. θ_E depends on the lens mass, and it becomes smaller for single planetary lenses, and therefore ρ becomes larger, and the finite-source effect is stronger. As a result, the top of the short-timescale peak becomes flattened, and the overall shape of the event begins to resemble a top-hat function. An example of such an event can be seen in panel (c) of Figure 1.

2.4. Binary-lens Microlensing Events

When there is more than one lens, the range of possible light-curve morphologies increases dramatically. In binary-lens systems, various shapes of the light curves depend on parameters such as q, the mass ratio, s, the projected separation, and α, the angle the path of the source makes with the line connecting the two lenses. Since α can be any value between zero and 360°, this quantity, along with q, s, and t_E, allows the time, height, width, and number of the features in the light curve to vary significantly. Based on OGLE (Udalski 2003) and MOA (Bond et al. 2001) observations, around 10% of microlensing light curves show lens binarity signatures (Penny et al. 2011). Fully modeling each of these light curves is challenging because there are multiple minima in the χ² surface as a function of their parameters, and therefore a wide range of parameter space must be searched. Furthermore, there is no direct mapping between the observable features and the canonical parameters (q, s, α, t₀, t_E, u₀).

2.4.1. Stellar Binary-lens Events

Stellar binary lenses are likely as common or more common than planetary system lenses, but the probability of seeing deviations from a PSPL curve is much higher in nearly equal-mass (q ≳ 0.1) binary-lens light curves than light curves due to planetary mass ratio (q ≲ 0.01) binaries because the size of the caustics (the loci of infinite magnification for a point source) increases with increasing q. Therefore, they are typically easy to distinguish from planetary system lenses since the deviations from the standard PSPL form last much longer and in particular can be comparable to t_E. They usually have larger perturbations in their light curves but can be still misinterpreted as planetary binary-lens events (Han et al. 2016). Two examples of such light curves can be seen in panels (d) and (e) of Figure 1.

2.4.2. Planetary Binary-lens Events

A lens system containing a star and a single planet is a binary-lens system with a very small mass ratio (q). Note that there is no strict delineation (in terms of q) between a stellar binary-lens event and a planetary binary-lens event, and this is one of the reasons that make this a challenging classification problem. In these cases, the light curve is dominated by the host star, such that there is a main peak that can be described by the PSPL model with one or more small deviations caused by the planet (Gaudi 2012). As in a single-lens microlensing event, two images of the source will be formed as it passes close by the lens star: an image outside of the Einstein ring of the lens star called the major image, and an image inside the Einstein ring of the lens star called the minor image. Depending on whether or not the projected separation of the planet and the lens star (s) is larger or smaller than the Einstein ring, the planet will have to perturb one of these images to leave a signature on the star's light curve, and the perturbation will have different characteristics. The effect of the planet on the light curve can also be explained by the positions of caustics. In real cases, since the source star is not a point source, we observe sharp high-magnification peaks when there are caustic crossings. The numbers and sizes of these caustics depend on the mass ratio (q) and the projected separation (s). The structure of the caustics leads to the planetary features in the light curves of these events (Gaudi 2012).

Caustics have three main topologies. Depending on the values of the projected separation and the mass ratio, there are one (intermediate topology), two (wide topology), or three (close topology) caustics (for a thorough description of planetary caustics please refer to Gaudi 2012). These caustics are also classified into two classes, called the central caustic and the planetary caustic, and depending on which one is closer to the path of the source, the shapes of the light curve are different.

If the source crosses a caustic curve, two sharp deviations will be seen in the light curve assuming sufficient sampling corresponding to the entrance and exit of the source through the caustic. The shape of these deviations depends on the path of the source, which is determined by the source path angle (α). Panel (f) of Figure 1 includes an example of a caustic-crossing planetary microlensing event. In this work, we analyze these perturbations based on whether or not the source has crossed the caustics, and we investigate the features created in each of these cases. In cases with no caustic crossing, the source only passes close to the caustics, and that results in features like short bumps or troughs, or both, in the light curve. These deviations are typically small in magnitude and short in duration, which makes them sometimes very difficult to detect, especially in low-cadence surveys, and they can be created by either the central or the planetary caustics. In panels (g) and (h) of Figure 1, we show examples of non-caustic-crossing events caused by the planetary caustic. These perturbations happen when the source passes closely by the planetary caustic, and finding the time of the perturbation can help determine the location of the caustic and therefore estimate the projected separation of the planetary system. The event in panel (g) is a perturbation in the major image, and the event in panel (h) is a perturbation in the minor image.

In cases with caustic crossing, we need to identify the times of the two sharp deviations to find an estimate of the time the source spends inside the caustic. This allows us to estimate the size and location of the caustics and therefore find an estimation of the mass ratio and the projected separation of the planetary system (e.g., Poleski et al. 2014).

3. Scope of the Analysis

As mentioned in Section 1, classifying astronomical time series using ML algorithms relies on the data sets used for training and testing. Features computed for a data set cannot always be used to analyze another data set; therefore, the connection between the data set and the defined features should be discussed. In this paper, we focus on computing features for simulated Roman light curves. These light curves are simulated based on the Roman Cycle 7 mission design and have six 72-day seasons with 15-minute cadence. These light curves are similar to the light curves generated by Penny et al. (2013, 2019), and the full details of the simulated data can be found in these papers. This data set was designed in particular to be used for the Roman microlensing data challenge⁷ and has four classes of variability types, consisting of different types of microlensing light curves, along with a particular type of stellar variability that can be misinterpreted as microlensing. These classes include CVs, single-lens systems, stellar binary lenses, and planetary system lenses. An example of each class of light curves used in this analysis can be seen in Figure 3.

**Figure 3.** Four classes of light curves in the data set used for this analysis. (a) Example of the light curve of a CV. (b) Single-lens model that contains a single symmetric peak. (c) Lens systems containing a stellar binary. (d) Example of an event with a planetary system as the lens. The blue data points are from the original simulated light curves, and the black data points are smoothed light curves using the low-pass filter.
Download figure:
Standard image High-resolution image

The data set contains 4181 light curves, including 429 CVs, 1626 single-lens systems, 1386 stellar binary lenses, and 688 planetary system lenses. The CV class contains CV variabilities with zero, one, or multiple instances of outbursts in the baseline. The single-lens systems include both stars and planets as lenses; therefore, their shape varies in terms of the duration, amplitude, and morphology of the peak. As mentioned before, the stellar binaries and the planetary systems can result in very similar light curves, and the transition between these two classes is smooth in terms of the observables, and distinguishing between them is one of our main concerns.

The light curves selected for this project have a wide range of signal-to-noise ratio. To enter our sample, events had to pass a Δχ² > 500 cut of the true model to a flat line, and binary and planetary events had to pass a Δχ² > 160 cut of the true model relative to a single-lens light-curve model (see Penny et al. 2019 for full details). While these Δχ² values suggest quite high formal detection significance, the cuts can be passed by light curves with signal-to-noise ratio per data point less than unity, e.g., a t_E = 25-day single-lens event will have 4800 data points within one t_E of the event peak. Similarly, a Jupiter-mass planet with a planetary deviation lasting ∼1 day (i.e., ∼96 data points) can pass the detection cut with an average contribution of Δχ² per data point of less than 2. That said, our selection cuts do not allow us to test our method's sensitivity to the lowest signal-to-noise ratio events that could potentially be formally detected.

4. Feature Detection and Parameterization

In Section 2, we described major types of microlensing variability with different morphologies and amplitudes. In this section, we introduce a package of statistical tests and model fits we have assembled into an algorithmic approach to detect the features of a set of light curves. These features individually can then be used to distinguish microlensing events from other variability and to classify the different types of microlensing systems. Ultimately, in a subsequent paper, we will use the combined power of all of the features as input to an ML classifier. The package includes the following tools:

1.
Peak finder
2.
Gould two-parameter point-source point-lens fit (G-PSPL)
3.
Symmetry check
4.
Trapezoidal function fit
5.
Cauchy distribution fit
6.
Planetary parameter finder
7.
Chebyshev polynomial fit.

In the sections below we introduce each of these tools and explain how they are applied to the simulated data set and evaluated. Not all tools are used independently, and some are used in combination with other tools.

Before applying some of these tools, we employ a low-pass filter to the light curve, which allows low-frequency signals to pass and therefore reduces the high-frequency noise in the data. This low-pass filter has a smoothing window of 10 subsequent data points. The width of the smoothing window was optimized for the simulated Roman data set analyzed in this work but should be separately optimized for each data set so as to remove nonastrophysical noise as much as possible.

4.1. Peak Finder

The primary idea behind this tool is to identify individual peaks in a light curve, which are often characteristic of microlensing events. After employing the low-pass filter, we bin the light curve in time, and then, in each bin, we count the number of data points with positive deviations larger than some number of standard deviations in that bin, which we define as the peak threshold. If a bin has more than one data point beyond that threshold, that bin is defined as a peak.

When using this algorithm, two parameters are important in determining the success rate: the bin size and the peak threshold. Depending on the particular implementation, we can use this method to search for a single large peak, such as a single-lens microlensing event. Or after identifying a single-lens event, we can subtract a model and then search for additional peaks in the light curve residuals that would indicate a binary-lens event. We can also search simultaneously for multiple large peaks at once.

To examine the performance of this method in a particular case, we consider the case of searching for a single peak within a light curve. We test different values of the bin size and threshold parameters to determine what combination is more successful at detecting peaks. We define a successful detection as one where exactly one bin identifies a peak and the light curve is known to be due to a microlensing event. We define a failed detection as one where a light curve labeled as containing a microlensing event did not have any peaks identified, or multiple peaks identified. With this range of bin sizes, this algorithm will not detect narrow deviations in microlensing light curves and will only look for the main event. Note that the algorithm is looking for single-peaked variations, so it might also find single-peaked transients. We expect that next stages of analysis will distinguish between these cases. This method has a low false negative rate. For example, it is possible that it would fail at detecting a microlensing light curve at low thresholds. This happens when the event has a low magnification and it might be considered similar to the noise in the light curve.

Figures 4–6 show the performance of the algorithm across a range of bin sizes and thresholds. They display the true positive rate (TPR), false positive rate (FPR), and false discovery rate (FDR) for different options of bin size and peak threshold. Figure 4 shows that bin sizes larger than about 50 days and thresholds of 3σ–5σ give the highest TPR of about 80%.

**Figure 4.** Results of the peak finder algorithm applied to all of the light curves in the data set for different values of bin size and threshold. The TPR, the fraction of microlensing events that are labeled as microlensing, is plotted vs. bin size, with various thresholds denoted by symbol and color. The value of the TPR is roughly independent of the bin sizes and that thresholds of 3σ–5σ yield the highest TPR.
Download figure:
Standard image High-resolution image

Figure 5 shows the FPR, which represents what fraction of non-microlensing events are labeled as microlensing. Bin sizes do not consistently affect the FPR. Thresholds larger than 5σ give unreasonably low FPR, and that is because at these thresholds most light curves will have zero peaks identified. Thresholds of 3σ–5σ give an FPR of 50%. Since some fraction of non-microlensing events have a single peak, we expect that a group of non-microlensing events will be classified as microlensing even though the algorithm has successfully identified one peak in them.

Figure 6 show the FDR, which represents the fraction of events labeled as microlensing that are are in fact non-microlensing. For the same reason as mentioned above, thresholds larger than 5σ are not favorable, and thresholds of 3σ–5σ have reasonably low FDR of larger than 7% for most bin sizes. Note that the extremely low rates of FDR are also an indication of the unbalanced data set (a data set in which there is an unequal number of objects in each category), and our purpose is to compare how sensitive it is to different thresholds and bin sizes.

Using this peak finder algorithm with a bin size of 60 days and a threshold of 5σ, we can identify which light curves show a single peak event. In Section 5, we use this feature in conjunction with the other feature detection tools.

4.2. Gould Two-parameter PSPL Fit

The PSPL function has been demonstrated to be a good fit to most microlensing light curves and to not match most other types of astrophysical variability (Di Stefano & Perna 1997). Therefore, the goodness of fit to the PSPL model can be used as a feature of the light curve. In this section, we demonstrate employing a two-parameter PSPL model first introduced by Gould (1996) to detect curves similar to a Paczyński curve. The reason for not using the regular PSPL model is that brute-force searches over the three nonlinear parameters (t_E, t₀, u₀) of a PSPL model are slow, while the two-parameter Gould approximate PSPL model (G-PSPL) captures the basic observables of a full PSPL model but requires many fewer computations. For more details refer to Gould (1996) and Woźniak & Paczyński (1997).

In order to employ the PSPL model as a light-curve feature, we employ a version of the Gould (1996) two-parameter PSPL model (G-PSPL) that has been used by the KMTNet survey to detect microlensing events. For that survey, Kim et al. (2018) fit a combination of two two-parameter G-PSPL models: one with the assumption of u₀ = 1, and one with the assumption of u₀ being very close to zero. This approach reduces the number of fitted parameters and can be used to detect both high- and low-magnification events. This double two-parameter G-PSPL fit is defined here as a function F(Q):

$\begin{eqnarray}&&F(Q)={f}_{1}\times (A1(Q)+{A}_{2}(Q))+{f}_{0},\end{eqnarray} \tag{ 4 }$

where

$\begin{eqnarray}&&{A}_{1}(Q)=\displaystyle \frac{1}{\sqrt{Q}}\end{eqnarray} \tag{ 5 }$

and

$\begin{eqnarray}&&{A}_{2}(Q)=\displaystyle \frac{1}{\sqrt{1-\tfrac{1}{{\left(1+\tfrac{Q}{2}\right)}^{2}}}}\end{eqnarray} \tag{ 6 }$

are functions that represent good fits to high- and low-magnification events, respectively. The two parameters f₀ and f₁ do not have a physical meaning in this equation, but in the limit of u₀ = 1 and u₀ close to zero, they are related to the blending and source parameters, described in detail in Kim et al. (2018). A₁ and A₂ are both functions of Q,

$\begin{eqnarray}&&Q(t)=1+{\left(\displaystyle \frac{t-{t}_{0}}{{t}_{\mathrm{eff}}}\right)}^{2}.\end{eqnarray} \tag{ 7 }$

Q is a function of time and depends on two parameters, t₀ and t_eff. t₀ is the time of the maximum magnification, and t_eff approximates u₀ × t_E in the limit of u₀ ≪ 0.5 and helps characterize the amplitude and duration of the microlensing event. For fitting the function F(Q), we set the initial values of the parameters f₀ and f₁ to 0.5, and t₀ is the time of the maximum magnification in the light curve. For t_eff, we choose a list of seven initial values of 0.01, 0.1, 0.5, 2, 5, 10, and 20. For high-magnification events, low values of t_eff usually provide a better fit, and for low-magnification events, larger values of t_eff are a better fit. The fit with the minimum χ² value is used as the selected model. Note that for the remainder of the paper we specify whether we use A G-PSPL or a PSPL model fit.

4.3. Symmetry Check Algorithm

If we identify a peak in a light curve and fit a two-parameter G-PSPL model to the data, we then employ a check on the symmetry of the event. In Section 4.2, we showed that this method can be used to detect microlensing events. Here we calculate the reduced χ² statistic separately for the right and left sides of the peak in the light curve. If the peak is symmetric, the ratio of ${({\chi }_{\mathrm{left}}^{2})}_{\mathrm{reduced}}/{({\chi }_{\mathrm{right}}^{2})}_{\mathrm{reduced}}$ , which we define as β, is very close to unity. Since the number of data points in the right and left wings of the peak might not be exactly equal, we use the reduced χ². Deviations of β from unity can be related to additional physical details in the light curves, such as the existence of planetary or binary star companion lensing signatures, parallax effects, or the asymmetry in non-microlensing peaks like CVs.

In Figure 7 we plot histograms showing the distributions of variability types for a set of ranges of β values. Each column represents events with a specific range of β. We find that for CV light curves the asymmetry is extremely large, consistent with the morphology of CV light curves, displaying steep rises and more gradual decreases. Single-lens microlensing events are symmetric, with β values close to unity. Binary lenses have a wide range of β values depending on whether they are caused by a planetary system or a stellar binary system. These ratios can be used as a feature to obtain information about the kinds of deviations from the PSPL model and classify the light curves.

4.4. Trapezoidal Function Fit

As explained in Section 2.3, lensing events caused by FFPs have a short duration and amplitude and are more often strongly affected by the finite-source effect. Therefore, their shapes approximately resemble a trapezoidal function.

The parameters of the trapezoidal function are the baseline magnitude (a), maximum magnitude (b), time of the first rise (τ₁), the time when maximum magnitude is reached (τ₂), the time when maximum magnitude ends (τ₃), and the time when the magnitude returns to the baseline (τ₄). Figure 8 shows a diagram of the function and its six parameters. We must first find initial guesses for the six parameters and then fit the function and obtain more accurate estimates of the parameters. We also calculate the duration of the trapezoidal portion, Δτ_full, and the duration of the flat section of the trapezoid, Δτ_top, as presented in Equations (8) and (9) and shown in Figure 8:

$\begin{eqnarray}&&{\rm{\Delta }}{\tau }_{\mathrm{full}}={\tau }_{4}-{\tau }_{1}\end{eqnarray} \tag{ 8 }$

$\begin{eqnarray}&&{\rm{\Delta }}{\tau }_{\mathrm{top}}={\tau }_{2}-{\tau }_{3}.\end{eqnarray} \tag{ 9 }$

**Figure 8.** Trapezoidal function with its defined parameters.
Download figure:
Standard image High-resolution image

In order to find initial guesses for the six parameters, we first subtract the median baseline magnitude from the light curve, setting a to zero. We then select 20, 2, 0.2, or 0.02 days as initial guesses for the total duration (Δτ_full). We define t₀ as the time of maximum brightness and assume Δτ_full = 2Δτ_top. We then define the remaining parameters τ₁, τ₂, τ₃, and τ₄ as the following quantities, respectively: ${t}_{0}-\tfrac{{\rm{\Delta }}{\tau }_{\mathrm{full}}}{2}$ , ${t}_{0}-\tfrac{{\rm{\Delta }}{\tau }_{\mathrm{full}}}{4}$ , ${t}_{0}+\tfrac{{\rm{\Delta }}{\tau }_{\mathrm{full}}}{4}$ , and ${t}_{0}+\tfrac{{\rm{\Delta }}{\tau }_{\mathrm{full}}}{2}$ . We fit the resulting models based on the initial guesses for Δτ_full, using a least-squares minimization approach. We finally select the fit that minimizes the χ² statistic.

We have applied this algorithm to our test data set, with Figure 9 showing an example of a best-fit function for a simulated light curve. We find that half of the total duration (Δτ_full) is a good representation of the Einstein timescale (t_E) of the event, and we compare it with true values of t_E of the events. Figure 10 shows the estimated versus true duration of single-lens microlensing events with t_E < 2 days that are candidates for FFPs. We find that for events with t_E < 2 days there is a median absolute deviation of 0.06 days, while for events with t_E ≥ 2 days there is a median absolute deviation of 0.8 days.

**Figure 10.** True vs. estimated Einstein timescale (t_E) of the events with t_E < 1 days found by the trapezoidal function fit. These events are candidates for FFPs.
Download figure:
Standard image High-resolution image

We define another parameter $\kappa =\tfrac{{\rm{\Delta }}{\tau }_{\mathrm{top}}}{{\rm{\Delta }}{\tau }_{\mathrm{full}}}$ as the trapezoidal timescale ratio, which is the ratio of the duration of the flat part of the trapezoid to the total duration of the trapezoid. This quantity was initially set to 0.5 for the initial guesses of the function fitting. This parameter can be thought of as a measure of how square or peaked the event is, analogous to the kurtosis of the event. We show that using κ along with Δτ_full allows us to flag the events that are strongly affected by the finite-source effect. We can characterize the approximate significance of the finite-source effect using ∣u₀∣/ρ, the ratio of the impact parameter to the finite-source ratio. When ∣u₀∣/ρ is smaller than unity, it implies that the lens passes directly over the source, and thus that can be a sign of significant finite-source effects.

The left panel of Figure 11 shows the trapezoidal timescale ratio (κ) versus Δτ_full for the subset of all light curves that meet the fitting criteria for single-lens models, which are described below in Section 5. Note that both of the quantities on the axes in the left panel are found experimentally using the trapezoidal function fit. In that panel, most of the green data points have larger values of κ, which implies a more square light-curve shape caused by strong finite-source effects. The right panel shows the distributions of the true values of ρ for all the data points inside and outside of the black box in the left panel. That panel indicates that light curves with small values of Δτ_full and large values of κ tend to have larger values of ρ (gray), and light curves with larger values of Δτ_full and smaller values of κ tend to have smaller values of ρ (hatched white). This is because events with short duration caused by FFPs are more affected by the finite-source effect and therefore more resemble a trapezoidal function, with Δτ_full and κ helping to indicate these events.

4.5. Cauchy Distribution Fit

For microlensing events with a strong finite-source effect in the absence of limb-darkening effects, the top of the peak becomes flatter and will not look like a PSPL function. For detecting this phenomenon, we need to parameterize the flatness of the top of the peak. Effectively, we would like to use a functional fit that can apply to the range of morphologies between that of a single-lens PSPL model and a trapezoid that much more closely fits a strong finite-source effect. Our goal here is to find a correlation between the fitted parameters and the finite-source ratios. For that purpose, we use a Cauchy distribution function to fit the light curves. The difference in the minimum χ² of the Cauchy fit and the PSPL fit (shown in Equations (3) and (1)), along with one of the parameters of the Cauchy distribution, can be used as features.

The Cauchy distribution shown in Equation (10) has four parameters, time of the peak (t₀), the duration of the peak (σ), the flatness of the peak (b), and the amplitude of the peak (a),

$\begin{eqnarray}&&C(t)=\displaystyle \frac{a}{1+{\left|\tfrac{t-{t}_{0}}{\sigma }\right|}^{2b}}.\end{eqnarray} \tag{ 10 }$

The functions C(t) and PSPL have two common parameters, time and duration of the peak. After fitting these two functions to a light curve, we have fitted values for t_E and t₀, which we refer to as t_E,PSPL, t_E,Cauchy, t_0,PSPL, and t_0,Cauchy. Note that t_E,Cauchy is equivalent to σ in Equation (10). We define the difference in the χ² of the PSPL fit and the Cauchy fit as a feature characterizing the flatness of the top of the curve represented in Equation (11). In order to calculate ψ, we select a section of the curve at the peak, between t₀ − (t_E,PSPL × u_0,PSPL) and t₀ + (t_E,PSPL × u_0,PSPL),

$\begin{eqnarray}&&\psi ={{\chi }^{2}}_{\mathrm{PSPL}}-{{\chi }^{2}}_{\mathrm{Cauchy}}.\end{eqnarray} \tag{ 11 }$

The more the event is affected by the finite-source effect, the more the top of its peak deviates from the PSPL model and is more similar to the Cauchy model. Thus, for single-lens events with a flatter peak, ψ is positive. Even in cases where the Cauchy distribution fits much better than the PSPL fit, for events that are not caused by FFPs that still experience strong finite-source effects, the Cauchy distribution will typically be a better fit than the trapezoidal function. Figure 12 shows a single-lens event highly affected by the finite-source effect, along with the best-fit PSPL and Cauchy models. The two curves deviate the most close to the peak of the event.

**Figure 12.** Cauchy and PSPL functions fitted to a Roman simulated single-lens microlensing light curve highly affected by the finite-source effect. The blue data points are from the simulated light curve, the red curve is the fitted PSPL model, and the orange curve is the fitted Cauchy model.
Download figure:
Standard image High-resolution image

Figure 13 shows values of the Cauchy feature (ψ) versus ρ/u₀ for all single-lens events that have at least four observations within a time of t_E,PSPL × u_0,PSPL from the maximum. The top panel shows positive values of ψ, and the bottom panel shows the negative values. The events with flatter peaks have a positive value of ψ and, on average, have a shorter duration than those with negative ψ. Events with large positive ψ (flatter peaks) appear in the upper right corner of this figure and have ρ/u₀ > 1. Events with a negative ψ include both short- and long-duration events. The black lines show the one-to-one relation for events with positive and negative ψ and have median absolute deviations of 0.45 for positive ψ and 0.64 for negative ψ. Events in the lower region mostly have small negative values of ψ, indicating that neither the PSPL function nor the Cauchy function is a significantly better fit to the data. Those with large negative values are those with sharp peaks that are well described by the PSPL function, which also have low u₀. This plot shows that the sign and magnitude of ψ can help identify the flattest and sharpest microlensing peaks and flag the ones that are more likely affected by the finite-source effect.

The other feature that we use for this purpose is parameter b in Equation (10). We expect that if the event exhibits strong finite-source effects, which means a flatter top, this feature would take a larger value, and if the finite-source effect is negligible, the feature would be very close to unity (in the absence of limb-darkening effects). In reality, b does not correlate with ρ for most events, but once ρ is greater than 0.1, we do see a positive correlation with b, as seen in Figure 14. We therefore retain b as a potentially useful feature.

**Figure 14.** Plot of the parameter b in Equation (10) vs. true parameters of ρ. It shows that the b values are in general larger for events with ρ larger than 0.1.
Download figure:
Standard image High-resolution image

4.6. Planetary Parameter Finder

Our goal in this section is to measure the two physical parameters associated with the planetary binary-lens events. These are s, the projected separation between the planet and the lens star, and q, the mass ratio of the planet and the lens star. The approach that we take here is to first fit the main event with a PSPL function and then find the deviations from the PSPL model by looking at the residual, to identify peaks or troughs. We then seek to identify cases where these residuals have one significant peak or trough, or if they have two significant peaks.

The reason we take this approach is that the single-deviation residuals can be characterized much more robustly than the double-peaked ones. For events of both groups, we fit the "busy" function introduced by Westmeier et al. (2014). This function is primarily used to describe double-peaked features in spectra, and with some changes in its parameters, we can also use it to find the single-peaked deviations. Khakpash et al. (2019) suggest fitting the single-peaked residuals with a Gaussian and show that this approach is useful mostly for low-mass ratio events. Here we take the same approach to find the initial parameters in the residual, but instead of a Gaussian, we fit all deviations with the busy function. After fitting the residual, in the next round of fittings we fit a PSPL plus a busy function, and we use parameters obtained from the first two fits as initial parameters of this fit. Next, we calculate s and q using the parameters found by the final fit.

The busy function is commonly used to describe the double-horn profile of galaxy spectra (Westmeier et al. 2014). The function is shown in Equation (12) and has nine parameters that are sketched in Figure 15. This function comprises two error functions and a polynomial of degree n. The parameters x_e and x_p determine the location of the middle of the two error functions and the middle of the polynomial, δ₁ and δ₂ are the steepness of the two error functions, n is the degree of the polynomial and determines the steepness of the middle trough, c determines the depth of the polynomial, w determines the distance of the error function zero points from x_e, a determines the height of the error functions, and ε is the horizontal scaling of the function

$\begin{eqnarray}\begin{array}{rcl}A(t) & = & (a/4)\times (\mathrm{erf}\ (\ {\delta }_{1}(w+\varepsilon \times t-{x}_{e}))+1)\ \\ & & \times \,(\mathrm{erf}\ ({\delta }_{2}(w-\varepsilon \times t+{x}_{e}))+1)\ \\ & & \times \,(c\ \times \ {| \varepsilon \times t-{x}_{p}| }^{n}+1).\end{array}\end{eqnarray} \tag{ 12 }$

**Figure 15.** Schematic plot of the busy function, with features associated with eight function parameters. The parameter ε (not shown in the figure) represents a horizontal scaling of the function without disrupting its shape.
Download figure:
Standard image High-resolution image

Depending on the values of the nine parameters, the shape of the function can differ significantly. We take advantage of this fact and use different shapes of the function to fit single-peaked and double-peaked deviations from the PSPL model. The two forms of the busy function used to fit the residuals are shown in Figure 16. The left panel shows a double-horn shape of the busy function, and the values of the parameters can be found in the plot. The three curves represent the shapes of the function when only the steepness of the two error functions is altered. This set of parameters results in a form that can describe the caustic-crossing features of the planetary microlensing light curves. In the right panel, the value of c in Equation (12) is set to zero, and therefore the equation only comprises two error functions. The shape of the function resembles the Gaussian function used by Khakpash et al. (2019) to fit single-peaked deviations with an additional ability to describe asymmetric deviations. The three curves show how the shape of the function changes as the steepness of the two error functions are altered.

**Figure 16.** Two different shapes of the busy function that are fitted to the single- and double-peaked residuals. Left panel: double-horn shape of the busy function, and the values of the parameters. The three curves represent the shapes of the function when only the steepness of the two error functions is altered. Right panel: value of c in Equation (12) is set to zero, and the equation only comprises the two error functions and no polynomial. Only one peak is present, with the two error functions controlling the steepness of the two sides of the peak. The three curves show how the shape of the function changes as the steepness of the two error functions is altered.
Download figure:
Standard image High-resolution image

We apply this approach to the set of simulated Roman planetary microlensing light curves. We follow the procedure in Khakpash et al. (2019) and first fit a PSPL function as shown in Equations (1) and (3) to the light curves. We find an initial value for t₀ by finding the time of the maximum flux in the light curve, and we set the initial value of f_s to be 0.5. For an initial guess for the duration of the event t_E, we first interpolate between the data points using a cubic interpolation to estimate a continuous version of the data, and then, for events with u_0,initial < 0.5, we assume that t_E is the interval between times when the magnification is 1.34. For events with u_0,initial > 0.5, we set t_E equal to the interval between times when magnification is 1.06. We use Equations (1) and (3) to calculate an initial guess for u₀ based on the above other assumed values at time t = t₀. Then, we fit the PSPL function using the initial guesses and then apply the peak finder algorithm to the residual.

Next, we use the peak finder (Section 4.1) to look for deviations in the residual, and we force it to find one or two peaks. We start with a large bin size of about half the light-curve baseline and a threshold of 3σ. With these values, the algorithm finds the most significant deviation. Then, we start decreasing the bin size by 50% until either it finds two peaks or the bin size reaches 0.2 days. At any of these decreasing steps, if it finds two peaks that are separated by less than 10 days, we end the search. This condition aims at preventing the algorithm from selecting multiple peaks in a caustic-crossing microlensing event. If the search concludes with the detection of one deviation, we accept that and move on to the next step. If it finds zero deviations, we simply select the maximum or minimum data point in the residual and identify that as a single deviation.

At this point, we fit the modified busy function to peaks identified in the residual. For this purpose, we identify the time of the single deviation or the middle point of the double-peaked deviation to be initial guesses for x_e and x_p. We set these two values equal to zero, the values of δ₁ and δ₂ are set equal to 1, and the value of ε is set to 5. Parameters a, w, n, and c are set up differently depending on the number of deviations found in the residual. If there is one deviation, n and c are set equal to zero, a is equal to the largest peak or trough in the residual, and w is set equal to unity. If there are two deviations, initial values of n and c are set to 10 and 0.02, the value of a is set equal to the median of the residual values between the two peaks, and w is set equal to half of the distance between the two peaks.

Parameters describing the deviations in the residual are obtained by the busy function fit. Using these parameters along with the PSPL parameters as initial values, we then fit a PSPL plus a busy function to the light curve, and we find final values of 13 parameters, four of them describing the main event and nine of them describing the deviation. Using these values, we calculate the two physical parameters s and q using two different approaches.

When only one deviation is found, the busy function will turn into two error functions with six parameters. After fitting the function to the deviation, we determine the duration of the deviation by looking at where the fitted model is not zero, and we assume half of that to be the duration t_Ep. t_p is set to x_e, and then we find the two values of s by solving Equation (13). If the deviation is positive, a secondary check is done to find out whether there is a large trough close to this peak. If there is a large demagnification close to a single caustic crossing, the event is very likely to have s < 1, which could also be due to a resonant caustic. In the secondary check, the ratio of the sum of negative data points in the residual of the busy function fit to the sum of positive data points is calculated, and if the ratio is larger than unity, the value of s < 1 is chosen; otherwise, s > 1 is selected. If the deviation is negative, the value of s < 1 is chosen. The estimated value of q is then obtained by Equation (14):

$\begin{eqnarray}&&| s-\displaystyle \frac{1}{s}| =u,\ \mathrm{where}\ u=\sqrt{{{u}_{0}}^{2}+{\left(\displaystyle \frac{{t}_{0}-{t}_{p}}{{t}_{{\rm{E}}}}\right)}^{2}}\end{eqnarray} \tag{ 13 }$

$\begin{eqnarray}&&q={\left(\displaystyle \frac{{t}_{\mathrm{Ep}}}{{t}_{{\rm{E}}}}\right)}^{2}.\end{eqnarray} \tag{ 14 }$

When there are two deviations, we assume that the deviations are caused by crossing the planetary caustic and that these epochs correspond to when the source approaches two cusps of the caustic. Using these epochs, we calculate the distance of the source from the lens at these times using ${u}_{\mathrm{1,2}}=\sqrt{{{u}_{0}}^{2}+{\left(\tfrac{{t}_{0}-{t}_{\mathrm{1,2}}}{{t}_{{\rm{E}}}}\right)}^{2}}$ . Han (2006) calculates the size of the planetary caustics in terms of s and q, and we find the size of the caustic using simple geometry, and then we use their formulation to calculate s and q.

Figure 17 shows an example of the geometry involved in these lensing events. In this figure, planetary caustics of two systems with projected separation of 1.6 and 0.8 Einstein radius and mass ratio of 0.03 are shown. The shapes and sizes of the caustics depend on s and q. An example path of the source through the system is shown in both panels. The distance between the lens and the center of the caustics, LX, gives us the value of $| s-\tfrac{1}{s}|$ . At this point, we do another secondary check as described earlier, and we determine whether we have a system with s > 1 or s < 1. Next, for systems with s > 1, we find the vertical dimensions of the caustic (left panel of Figure 17), and for systems with s < 1, we find the distance between the two caustics by geometry (right panel of Figure 17). We refer to both of these values as height of the caustics. Han (2006) finds that these values are roughly equal to $\tfrac{2\sqrt{q}}{s\sqrt{{s}^{2}-1}}$ and $\tfrac{{s}^{3}\sqrt{q}}{4}\sqrt{{\left(\tfrac{2}{s}\right)}^{8}\,+\,27}$ , respectively. We then use these equations to find q. Table 1 summarizes the process we adopt to extract the system parameters from the light-curve residuals in the cases of one and two deviations (Poleski et al. 2014; Bozza 2000).

Table 1. Process Taken to Calculate s and q

One deviation:	1. t_p is determined.
	2. $u=\sqrt{{{u}_{0}}^{2}+{({t}_{p}-{t}_{0}/{t}_{{\rm{E}}})}^{2}}$ .
	3. $s-\tfrac{1}{s}-u$ is solved for s.
	4. Secondary check determines if s > 1 or s < 1.
	5. $q={\left(\tfrac{{t}_{{\rm{E}},p}}{{t}_{{\rm{E}}}}\right)}^{2}$ .
Two deviations:	1. t₁ and t₂ are determined.
	2. ${u}_{\mathrm{1,2}}=\sqrt{{{u}_{0}}^{2}+{({t}_{\mathrm{1,2}}-{t}_{0}/{t}_{{\rm{E}}})}^{2}}$ .
	3. LX is found by geometry.
	4. $\| s-\tfrac{1}{s}\| -{LX}$ is solved for s.
	5. Secondary check determines whether s > 1 or s < 1.
	6. For s > 1: height of the caustic $\sim \tfrac{2\sqrt{q}}{s\sqrt{{s}^{2}-1}}\Longrightarrow q$ is found.
	7. For s < 1: height of the caustic $\sim \tfrac{{s}^{3}\sqrt{q}}{4}\sqrt{{\left(\tfrac{2}{s}\right)}^{8}+27}\Longrightarrow q$ is found.

Download table as: ASCII Typeset image

Figures 18–20 show three examples of the PSPL plus the busy function fitted to planetary light curves. Figure 18 shows the fit for a system with one deviation in its PSPL residual, where the true projected separation of the system is smaller than unity. The algorithm first detects sharp peaks or troughs, and after fitting the busy function to the light-curve residuals from the PSPL fit, we then obtain the residual of the busy function fit. At the times where a single peak is found, we still cannot conclude that the event includes a major image perturbation. At this point, if the sum of the flux of negative points in the busy function fit residual is larger than the sum of the flux of the positive points, the system will be considered to be affected by a minor image perturbation, and s will be chosen to be less than 1. Otherwise, s will be larger than 1. The example in Figure 18 shows an event with a sharp positive deviation that appears to be a major image perturbation; however, the preceding trough indicates that this a minor image perturbation, and the algorithm correctly decides that s is smaller than 1. The true parameters for this event are s = 0.63 and q = 0.0003, and its fitted values are s = 0.64 and q = 0.0002.

**Figure 20.** Example of the PSPL plus the busy function to the light curve of a double-peaked event with s = 1.11 and q = 0.0008. The busy function provides a good estimate of the location and duration of the event, with the estimated system parameters of s = 1.02 and q = 0.007.
Download figure:
Standard image High-resolution image

Figure 19 shows the PSPL plus the busy function fitted to a double-peaked deviation with s < 1. The busy function fit may appear to be a poor fit to the deviations, but this fact is not considered a failure and in fact helps with determining whether it is a major or a minor image perturbation. After the secondary check, the algorithm decides to correctly choose s smaller than 1. The true parameters for this event are s = 0.62 and q = 0.0001, and the fitted values are s = 0.59 and q = 0.0003. Figure 20 shows another double-peaked event that has s > 1. The true parameters for this event are s = 1.11 and q = 0.0008, and the fitted values are s = 1.02 and q = 0.007.

In order to evaluate the performance of the algorithm, we compare the calculated values of s and q with their true values as demonstrated by Khakpash et al. (2019). Figure 21 shows the estimated values of s and q plotted versus their true values. Most values of s are well estimated by the method. The fitted values close to 1 arise from events caused by the central caustics. While the estimated values of q show a lot of scatter compared to the true value, they are generally estimated to within about one order of magnitude of the true values. Note that this algorithm is slower than the algorithm presented in Khakpash et al. (2019), but it fits a broader range of events.

**Figure 21.** Left panel: plot of fitted values of q vs. the true values. Fitted values of q are mostly within one order of magnitude of the true values for the PSPL plus busy function fits. Right panel: fitted s vs. the true values. Most values of s are well estimated by the method. The fitted values close to 1 occur for events caused by the central caustics for which our formalism breaks down. Blue circles are events without caustic crossings, and black circles are events with caustic crossings.
Download figure:
Standard image High-resolution image

4.7. Chebyshev Polynomial Fit

Di Stefano & Perna (1997) suggest that unlike binary-lens microlensing light curves that are significantly perturbed, smooth binary-lens events can be easily misclassified, and therefore it is useful to develop methods that can distinguish between this type of microlensing event and other types of similar variabilities. The method should work fast and be effective at marking light curves with smooth features. One of the approaches to approximate a function is to expand it in the form of a series of polynomials. For this purpose, Di Stefano & Perna (1997) suggest to use the Chebyshev approximation on microlensing light curves. The idea is that features of Chebyshev polynomials are useful for capturing the smooth features of binary-lens microlensing light curves without caustic crossings.

We have implemented the work of Di Stefano & Perna (1997) and applied it to our simulated data set. For this purpose, we first detect the peaks in the light curve, and then for each peak we choose the interval around the peak that includes the wings of the event up to a magnification of 1.06 that corresponds to an impact parameter of 2R_E (Di Stefano & Perna 1997).

We then find the coefficients of the Chebyshev polynomials using the formulation described by Press et al. (1992) and expand the selected parts of the light curves by the expression in Equation (15). The polynomial T_k has k + 1 extrema in the interval [−1, 1], where the Chebyshev polynomials are defined. To use this method, the time coordinates of the selected potion of our light curves should be within this interval, and so we convert the interval of time to be between −1 and 1 before fitting to the Chebyshev polynomials,

$\begin{eqnarray}&&f(x)\approx \left[\displaystyle \sum _{k=0}^{m}{c}_{k}{T}_{k}(x)\right]-\displaystyle \frac{1}{2}{c}_{0}.\end{eqnarray} \tag{ 15 }$

In this work, we choose the first 50 polynomials of this series (m = 50) to fit Equation (15) to the light curves, and then using the coefficients c_k, we calculate ${\rm{\Lambda }}=-{\mathrm{log}}_{10}{\left(({\sum }_{k=0}^{m}(\tfrac{{c}_{k}}{{c}_{0}}\right)}^{2})-1)$ as a feature parameter that can be used to distinguish between different light-curve types in our simulated data set.

In Figure 22, an example of a binary-lens event approximated by the Chebyshev polynomials of degree 50 can be seen. It is important to note that the whole light curve is not approximated in this method, and only the segment containing the event excluding gaps is selected. This is because the Chebyshev approximation is good for approximating functions that have a finite number of extrema in the interval of [−1, 1], and a nonvarying light curve is not usually well fitted by this approximation.

With this method, we now need to define Chebyshev parameters that are different for various types of light curves. Here we use Λ and the first six even coefficients as the parameters used to distinguish between different types of light curves. The reason for that is that the coefficients would decrease rapidly when going to higher-order terms in the polynomials, and therefore the first coefficients would be dominant. Also, since the microlensing events are closer to being an even function rather than an odd function, the expansion only involves the even terms in the Chebyshev polynomials.

We then check how well different values of Λ match to different light-curve types. Figure 23 shows that the value of Λ correlates fairly well with four different light-curve types. Note that the fractions are calculated for a given range of Λ, across each light-curve type, horizontally in the figure. Although the fractions of different variability types in the simulated Roman data set are not equal, there are large numbers of each variability type, so the distinctions seen in Figure 23 are still quite robust. We therefore expect that the continuous parameter Λ can be a useful input for an ML algorithm (such as a random forest) to classify event types.

**Figure 23.** Breakdown by variability type for each range of values of the Chebyshev feature Λ. Λ is equal to $-{\mathrm{log}}_{10}{\left(({\sum }_{k=0}^{m}(\tfrac{{c}_{k}}{{c}_{0}}\right)}^{2})-1)$ , where all coefficients are normalized by the first coefficient c₀.
Download figure:
Standard image High-resolution image

In addition to the feature Λ, we also follow the suggestion of Di Stefano & Perna (1997), to include the individual even-numbered coefficients of the Chebyshev polynomial as classification features.

5. An Algorithmic Approach

In this section, we introduce a suggested algorithmic approach to use the produced features as input to ML classifiers. The goal of this section is to provide an efficient procedure to use these features to train classifiers on a high-cadence data set and detect different types of microlensing light curves. In Table 2, we have a summary of the tools in our package and the associated features. Each of these tools can be applied to either the whole data set or a subset of it, and there may be more than a single algorithmic approach to use these features.

Table 2. Summary of the Introduced Algorithms and Their Resulting Features

Algorithms	Produced Features
Peak finder	Number of the peaks
G-PSPL fit	χ²: goodness of the G-PSPL fit
Symmetry check	β: a measure of asymmetry of the peak
Trapezoidal function fit	κ: ratio of the duration of the flat top of the trapezoidal function to total duration
	t_E,trap: duration of the trapezoidal function
Cauchy distribution fit	ψ: the difference between the goodness of PSPL and Cauchy fits
	b: a measure of the flatness of the top
Planetary parameter finder	s: projected mass ratio in units of Einstein radius
	q: mass ratio
Chebyshev polynomial fit	Λ: sum of the square roots of the Chebyshev coefficients
	a₂, a₄, a₆, a₈, a₁₀: first five even coefficients of Chebyshev polynomials

Download table as: ASCII Typeset image

In a paper under development (S. Khakpash et al. 2021, in preparation), we are implementing an algorithmic approach to employing the features discussed in this paper to comprehensively search for microlensing events, classify them by type, and derive preliminary system parameters. For any set of light-curve features, there are numerous ways to conduct classification, and our approach is by no means certain to be optimal in every way. We encourage other astronomers to make use of the features described here to develop independent classification methods.

The particular structure or sequence of a classification approach can vary. One might use all the features to detect all the categories at once, or, alternatively, find particular classes by doing a step-by-step classification. In an ideal case where there are thousands of examples of each class across a full range of empirical properties of each feature, it is likely better to use all the features to find all of the classes in a one-step classification. Since in this case we have very limited examples of some of the classes, we recommend a step-by-step classification approach.

The first step of classification is to distinguish microlensing light curves from other types of variability (e.g., CV in our data set). We refer to this step as classification step I. Although our goal in this work is not focused on this set, we believe that some of the features identified in this work can be used to improve the current existing classifiers focused on this task. We have tested this type of classification using features such as the number of the peaks found by the peak finder; Λ; a₂, a₄, a₆ produced by the Chebyshev fit; and ${{\chi }^{2}}_{\mathrm{PSPL}}$ , t_E,PSPL, β generated by the G-PSPL fit and the symmetry check.

Assuming that we have identified all the microlensing light curves, the next type of classification is to classify them into single-lens versus binary-lens events, which we refer to as step II. Note that in a real data set multi-lens events also exist that either can be added as a category or can be included in one category along with the binary-lens events. We use the same feature as in step I, excluding the number of peaks.

Once we have single-lens and binary-lens events, we classify the binary-lens events into stellar binary-lens and planetary binary-lens systems (classification step III) using ${{\chi }^{2}}_{\mathrm{PSPL}}$ , t_E,PSPL produced by the PSPL fit and s, and q produced by the PSPL plus busy fit.

Furthermore, single-lens events can be classified into classes of isolated short-timescale events likely caused by FFPs or planets on very wide orbits, stellar PSPL, and stellar FSPL events (classification step IV). We suggest using κ, Δτ_full, ψ, and b produced by the trapezoidal fit and the Cauchy/PSPL fit to obtain better results. This step will be investigated in the future work.

Figure 24 displays a diagram showing our suggested algorithmic approach to use the different features produced by our package. We have selected this approach to optimize the classification results considering our small data set. We have tested classification steps I, II, and III with four ML classifiers: a k-nearest neighbor classifier, a decision tree classifier, a random forest classifier, and a neural network classifier. The preliminary results for that are presented in the next section.

6. Preliminary Testing of ML Algorithms

As stated above, there are many ways to implement an approach to identify and characterize microlensing events with classification based on light-curve features. We are developing a thorough investigation into that question (S. Khakpash et al. 2021, in preparation), but for now we present a preliminary analysis using a few simple analysis steps based on the classification approach described in Section 5. We present the results of training four ML classifiers, a k-nearest neighbor classifier (KNN), a decision tree classifier (DT), a random forest classifier (RF), and a neural network classifier (NN), using the features we introduced in this paper. In order to test these algorithms, we follow the step-by-step scheme of Figure 24. It is important to note again that step IV of the classification in Figure 24 is not tested here and will be pursued in the future. At each step, we use 20% of our data set as the test set and the rest of it as the training set.

In order to compare results of these classifiers, we show confusion matrices made with the test set at each step along with their receiver operating characteristic (ROC) curves. As mentioned in Section 5, we understand that the one-step classification is a more ideal approach, and we tested this approach with our current data set. However, the limited number of object instances in each class was insufficient for achieving an overall acceptable accuracy. Nevertheless, the isolated confusion matrices of the step-by-step classifications are valuable to evaluate the utility of the features presented in this paper. We are planning to thoroughly investigate the one-step classification in a future paper.

At each step, we use the same data set and features for all of the four classifiers, but we optimize their hyperparameters separately. The RF, KNN, and DT classifiers are implemented using the scikit-learn package in Python (Pedregosa et al. 2011). The NN classifier is implemented using the Tensorflow and Keras packages in Python (Abadi et al. 2015; Chollet 2015). A summary of the features and hyperparameters of each classifier at each step is given in Table 3.

Table 3. Classifier Hyperparameters and Features

	Features	KNN	DT	RF	NN
Step I	Λ, a₂, a₄, a₆, β, number of peaks (bin size = 60, threshold = 4 & 6), ${\chi }_{\mathrm{PSPL}}^{2}$ , t_E,PSPL	metric = "manhattan," n_neighbors = 4	criterion = "gini," max_depth = 5, splitter = "best"	max_features = 7, min_samples_leaf = 1 n_estimators = 100, criterion = "entropy," bootstrap = False	4 dense layers, 3 dropout layers, activation = "relu" batch = 32, epochs = 300
Step II	Λ, a₂, a₄, a₆, β, ${\chi }_{\mathrm{PSPL}}^{2}$ , t_E,PSPL	metric = "euclidean," n_neighbors = 20	criterion = "entropy," max_depth = 5, splitter = "best"	max_features = 7, min_samples_leaf = 1 n_estimators = 100, max_depth = 5	3 dense layers, 2 dropout layers, activation = "relu" batch = 16, epochs = 300
Step III	Λ, a₂, a₄, a₆, ${\chi }_{\mathrm{PSPL}}^{2}$ , t_E,PSPL, s, q	metric = "minkowski," n_neighbors = 3, p = 15	criterion = "gini," max_depth = 7, splitter = "best"	max_features = 8, min_samples_leaf = 1 n_estimators = 300, max_depth = 8	4 dense layers, 2 dropout layers, activation = "tanh" batch = 32, epochs = 100

Download table as: ASCII Typeset image

6.1. Classification Step I

As shown in Figure 24, the first step of the algorithmic approach is to detect microlensing light curves among other stellar variability. We should first note that our current light-curve data set is not completely representative of what we expect from the Roman mission, since the non-microlensing type of variability in the light curves in our current data set only includes CVs. Most (but not all) forms of non-microlensing variability are expected to be periodic or quasi-periodic, and thus simply including CVs is a good starting point. Our data set for this set contains 4181 light curves, among which there are 3752 microlensing light curves (labeled as 1) and 429 non-microlensing light curves (labeled as 0).

We find that the test set and training set accuracy for all four classifiers in this step are very close. A common way to evaluate the results of a classification model is to plot a confusion matrix for it. A confusion matrix is a table containing the percentages of both correctly and incorrectly classified objects for each class in the data set. According to the confusion matrices shown in Figure 25, most ML tools can find the microlensing light curves with very small classification error, whereas about 40% of the CV light curves are misclassified. This could be a result of having a small training set, or incomplete hyperparameter tuning, or might be an indication of the need to include more features.

**Figure 25.** Confusion matrices of the four trained classifiers of classification step I. In this step, we aim at classifying all of the light curves into two classes of CV and microlensing.
Download figure:
Standard image High-resolution image

A more robust method of comparing different ML classifiers is to plot their ROC curves and calculate their values of the area under the curve (AUC). The closer the AUC is to unity, the better the performance of that classifier is. This includes plotting TPR versus FPR for different decision thresholds and finding the area under that. Figure 26 shows the ROC curves of the four classifiers trained in step I. The diagonal dashed line represents the ROC curve of a random classification. The ROC curves show that the NN and RF classifiers have a better performance and can achieve a higher TPR without lowering the FPR.

**Figure 26.** ROC curves of the four trained classifiers of classification step I, along with their AUC values. In this step, we aim at classifying all of the light curves into two classes of CV and microlensing. RF and NN have the largest AUC, implying that they are able to achieve higher TPR while the FPR is also low.
Download figure:
Standard image High-resolution image

6.2. Classification Step II

The second classification step is distinguishing between single-lens and binary-lens microlensing light curves as shown in Figure 24. Our data set for this step contains 4181 light curves, among which there are 2143 binary-lens microlensing light curves (labeled as 1) and 1626 single-lens microlensing light curves (labeled as 0). For this step, we find that DT and RF have similar training and test accuracy and seem to work better than NN and KNN, although NN seems to work much better than KNN.

Figure 27 shows confusion matrices of the four classifiers trained for this step. The confusion matrices suggest that RF works better at predicting both labels, whereas NN works best at finding a larger fraction of the binary-lens light curves. Figure 28 shows the ROC curves and AUC values of this step. The classifiers are overall working better in this step mainly because the data set is more balanced here.

**Figure 28.** ROC curves of the four trained classifiers of classification step II along with their AUC values. In this step, we are classifying the microlensing light curves into groups of single-lens and binary-lens microlensing light curves. The data set is more balanced in this step, and all of the classifiers seem to work well. RF and NN still show a better performance.
Download figure:
Standard image High-resolution image

6.3. Classification Step III

After finding the binary-lens light curves, the next classification step is distinguishing between stellar binary-lens and planetary binary-lens microlensing light curves as shown in Figure 24. Our data set for this step contains 2074 light curves, among which there are 688 planetary binary-lens microlensing light curves (labeled as 1) and 1386 stellar binary-lens microlensing light curves (labeled as 0).

This classification step is particularly important in this context since planetary binary-lens systems are the ones that astronomers would like to distinguish from the rest of the data set. In our tested example the number of light curves is lower than in the previous classification steps, and this decreases the accuracy of the classifiers. Because of this, we find that the overall accuracy values of this step are smaller than the previous steps. Additionally, stellar and planetary binary-lens systems are much less distinguishable from each other compared to the previous tasks, and for this reason the simpler algorithms of DT and KNN appear to have lower accuracy. RF and NN have higher overall accuracy, but NN has a higher test accuracy, which results in a larger fraction of the test set being correctly labeled. It seems that an algorithm like NN is more capable of distinguishing between these two categories, which is expected, as NN is theoretically more complex and is designed to find complicated patterns in a data set.

Figure 29 shows confusion matrices of the four classifiers. The confusion matrix of the NN shows a larger value of TPR compared to all the other confusion matrices, and this is more favorable since our ultimate goal is to detect planetary microlensing light curves. Figure 30 shows the ROC curves and AUC values of the four classifiers in step III. The data set in this step is smaller compared to the other two steps and is not well balanced. Therefore, the performance of the different classifiers is not as well differentiated as in the other steps. However, NN shows significantly better performance compared to the other classifiers.

**Figure 30.** ROC curves of the four trained classifiers of classification step III, along with their AUC values. In this step, we aim at classifying binary-lens light curves into two classes of stellar and planetary binary-lens microlensing. Since the data set is smaller and not well balanced, the overall performance of the classifiers in this step is lower than the other two steps. However, NN shows significantly better performance than the other classifiers.
Download figure:
Standard image High-resolution image

7. Conclusion and Future Work

Classifying light curves that manifest different types of stellar variability is still a major challenge. Although using ML methods to classify astronomical time series is a powerful tool, it is important to understand which method we should choose and what are the steps we need to take in order to use these methods effectively.

In this paper, we introduce a new approach toward classifying microlensing light curves based on light-curve morphologies. We introduce a package of tools including several functional fits that can be applied to the light curve in a fast and efficient way to extract information about the different morphological features in the light curves. This information is quantified as features that can be used to make decisions about the light-curve types.

This approach will be useful when it comes to analyzing large microlensing data sets from high-cadence surveys like Roman (Spergel et al. 2015) in the future and ongoing surveys like KMTNet (Kim et al. 2010). Our preliminary results in Section 6 show that the features produced by our tools can be used as input to ML classifiers like random forest to distinguish between different types of microlensing light curves and help prioritize the ones that are more likely to be caused by planetary systems.

The simulated data used in this paper included CV-like events as the only non-microlensing instance. An ideal data set would include a variety of stellar variability, and our developed package should be modified to include all other variability in its analysis. We believe that the same approach will be successful in recovering microlensing events from a wide variety of non-microlensing variability on its own.

There are currently a number of methods that attempt to categorize photometric variability in large data sets, such as parametric statistical methods and ML. ML methods for detecting different types of variabilities are becoming more common (e.g., Richards et al. 2011; Pichara 2013; Pashchenko 2017; Valenzuela 2017). For example, as mentioned before, Godines et al. (2019) have trained a random forest classifier to detect microlensing light curves in a low-cadence survey data set in real time. Our produced features can be a complementary set of features for such algorithms and not only can improve their results but also can give them the ability to detect possible planetary binary lenses.

In ML classifiers, increasing the number of object instances can significantly improve the results. Our largest training set included 4181 light curves, which is not a large number for most ML applications. Increasing this number to ∼10,000 light curves would yield more robust and reliable results. Some of the tools presented in this paper produce other parameters as well, and including different subsets of those parameters could also improve the results. This task needs to be done carefully, though, since adding more features that are not important might result in overfitting. Additionally, a great avenue to improve the results would be to test other classifiers like the Support Vector Classifier and Naive Bayesian, as well as investigate deeper neural networks. The improvement of the data set and the ML algorithms will be presented in a second paper of this series.

S.K. thanks the LSSTC Data Science Fellowship Program, which is funded by LSSTC, NSF Cybertraining grant No. 1829740, the Brinson Foundation, and the Moore Foundation; her participation in the program has benefited this work. She also thanks the International Microlensing Conference that took place in 2018 in New York City, NY, where there were discussions and works that initiated this project. R.S. gratefully acknowledges support from NASA grant 80NSSC19K0291. B.S.G. was supported by a Thomas Jefferson Chair for Space Exploration endowment at the Ohio State University.

Classifying High-cadence Microlensing Light Curves. I. Defining Features

Article metrics

Permissions

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Microlensing Models

2.1. Point-source Point-lens Microlensing Light Curves

2.2. Finite-source Effects

2.3. Free-floating Planets

2.4. Binary-lens Microlensing Events

2.4.1. Stellar Binary-lens Events

2.4.2. Planetary Binary-lens Events

3. Scope of the Analysis

4. Feature Detection and Parameterization

4.1. Peak Finder

4.2. Gould Two-parameter PSPL Fit

4.3. Symmetry Check Algorithm

4.4. Trapezoidal Function Fit

4.5. Cauchy Distribution Fit

4.6. Planetary Parameter Finder

4.7. Chebyshev Polynomial Fit

5. An Algorithmic Approach

6. Preliminary Testing of ML Algorithms

6.1. Classification Step I

6.2. Classification Step II

6.3. Classification Step III

7. Conclusion and Future Work

Footnotes

Classifying High-cadence Microlensing Light Curves. I. Defining Features

Article metrics

Permissions

Share this article

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Microlensing Models

2.1. Point-source Point-lens Microlensing Light Curves

2.2. Finite-source Effects

2.3. Free-floating Planets

2.4. Binary-lens Microlensing Events

2.4.1. Stellar Binary-lens Events

2.4.2. Planetary Binary-lens Events

3. Scope of the Analysis

4. Feature Detection and Parameterization

4.1. Peak Finder

4.2. Gould Two-parameter PSPL Fit

4.3. Symmetry Check Algorithm

4.4. Trapezoidal Function Fit

4.5. Cauchy Distribution Fit

4.6. Planetary Parameter Finder

4.7. Chebyshev Polynomial Fit

5. An Algorithmic Approach

6. Preliminary Testing of ML Algorithms

6.1. Classification Step I

6.2. Classification Step II

6.3. Classification Step III

7. Conclusion and Future Work

Footnotes