The QUEST-La Silla AGN Variability Survey: Selection of AGN Candidates through Optical Variability

, , , , , , , , and

Published 2019 May 21 © 2019. The American Astronomical Society. All rights reserved.
, , Citation P. Sánchez-Sáez et al 2019 ApJS 242 10 DOI 10.3847/1538-4365/ab174f

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0067-0049/242/1/10

Abstract

We used data from the QUEST-La Silla Active Galactic Nucleus (AGN) variability survey to construct light curves for 208,583 sources over ∼70 deg2, with a limiting magnitude r ∼ 21. Each light curve has at least 40 epochs and a length of ≥200 days. We implemented a random forest algorithm to classify our objects as either AGN or non-AGN according to their variability features and optical colors, excluding morphology cuts. We tested three classifiers, one that only includes variability features (RF1), one that includes variability features and also r − i and i − z colors (RF2), and one that includes variability features and also g − r, r − i, and i − z colors (RF3). We obtained a sample of high-probability candidates (hp-AGN) for each classifier, with 5941 candidates for RF1, 5252 candidates for RF2, and 4482 candidates for RF3. We divided each sample according to their g − r colors, defining blue (g − r ≤ 0.6) and red subsamples (g − r > 0.6). We find that most of the candidates known from the literature belong to the blue subsample, which is not necessarily surprising given that, unlike many literature studies, we do not cut our sample to point-like objects. This means that we can select AGNs that have a significant contribution from redshifted starlight in their host galaxies. In order to test the efficiency of our technique, we performed spectroscopic follow-up, confirming the AGN nature of 44 among 54 observed sources (81.5% efficiency). From the campaign, we concluded that RF2 provides the purest sample of AGN candidates.

Export citation and abstract BibTeX RIS

1. Introduction

Active galactic nuclei (AGNs) are one of the most energetic phenomena in the universe and are characterized by time-variable emission in every waveband in which they have been studied. Variability studies are fundamental to understanding the extreme physical conditions of accretion disks near supermassive black holes (SMBHs). Recent studies indicate that AGN variability can be well described as a stochastic process (e.g., damped harmonic oscillator or random walk; Kelly et al. 2009, 2014), with characteristic timescales ranging from days to years.

The Large Synoptic Survey Telescope (LSST; Ivezic et al. 2019) will revolutionize time-domain astronomy, providing for the first time the opportunity to study variable objects for a long period of time (∼10 yr), at very faint magnitudes (r ∼ 24.5 for single images), and with a large total covered area (>18,000 deg2). Simulations performed by the LSST AGN Science Collaboration predict the detection of over 107 AGNs to beyond r ∼ 24 (LSST Science Collaboration et al. 2009). This is a huge increase in the number of sources available for variability analysis as current studies typically only probe to limiting magnitudes of r ∼ 21, with the total number of sources between 10 and 105 (e.g., Cristiani et al. 1997; Vanden Berk et al. 2004; MacLeod et al. 2010; Peters et al. 2015; Simm et al. 2016; Caplar et al. 2017; Li et al. 2018). Given the ubiquity of variability and the large number of variable sources to be found with LSST, it is critical to characterize AGN variability and define reliable selection criteria before LSST begins operations.

Traditionally, AGN selection follows the philosophy of finding regions in UV/optical/mid-IR color–color space in which AGNs can be cleanly separated from stars and galaxies (e.g., Schmidt & Green 1983; Fan 1999; Richards et al. 2002, 2004, 2009; Lacy et al. 2004; Smith et al. 2005; Stern et al. 2005; Ross et al. 2012). However, some AGN populations have observed colors that fall outside the region typically occupied by bright, "blue" AGNs, mimicking those of stars, such as type 2 or obscured AGNs, broad absorption-line quasars (BAL–QSOs), high-redshift quasars (high-z QSOs; Butler & Bloom 2011; Palanque-Delabrouille et al. 2011, 2016), and low-luminosity AGNs (LLAGNs), whose colors can be highly contaminated by the emission from the host galaxy. Therefore, alternative methods to identify AGN candidates missed by traditional selection techniques are required, in order to obtain complete AGN samples. One promising selection method involves the use of variability techniques.

Butler & Bloom (2011) implemented a variability-based selection algorithm to classify high-redshift quasars in the Sloan Digital Sky Survey (SDSS; York et al. 2000) Stripe 82 field. They used damped random walk modeling (Kelly et al. 2009) to separate sources showing quasar-like variability from those with temporally uncorrelated variability. In particular, they targeted unresolved sources with redshifts in the range 2.5 ≤ z ≤ 3, where color-based selection of AGNs is less efficient, due to stellar contamination. Palanque-Delabrouille et al. (2011) used the variability structure function (SF; e.g., Schmidt et al. 2010) to separate quasars, variable stars, and nonvariable stars in SDSS Stripe 82 data. They implemented a neural network algorithm that separates point-like objects by their SF parameters. A similar technique has been used by the SDSS IV the extended Baryon Oscillation Spectroscopic Survey (eBOSS) team to select quasar candidates with z > 2.1 by variability (Myers et al. 2015; Palanque-Delabrouille et al. 2016). Peters et al. (2015) used color, variability, and astrometric data from SDSS to select point-like AGN candidates. They implemented a nonparametric Bayesian Classification Kernel Density Estimation (NBC KDE) to classify 35,820 type 1 quasar candidates in the Stripe 82 field. They tested various combinations of color and variability parameters, finding that using a combination of optical colors and variability parameters improves quasar classification efficiency and completeness over the use of colors alone. More recently, Tie et al. (2017) used data from the supernova fields of the Dark Energy Survey (DES; Abbott et al. 2018) to select quasars by combining color and variability selection methods. All these previous studies have shown the potential of selecting AGN candidates through variability analyses, demonstrating that variability-based techniques can considerably increase the number of AGN candidates in the redshift range where the colors of stars are similar to those of AGNs. Less clear is how deep into the low-luminosity and obscured AGN populations they can probe. This is of substantial importance given the eventual mismatch that X-ray and MIR surveys will have compared with LSST.

In this paper, we present our variability-based technique to select AGN candidates using data from the QUEST-La Silla AGN variability survey (Cartier et al. 2015). In this work, we aim to detect wider sets of AGN populations. In particular, we expect to detect sources that show clear signatures of a nonstellar continuum-emitting process in their centers, with emission lines broader than ∼1800 km s−1, regardless of their luminosity or shape in the QUEST images. We do not expect to detect many type 2 or obscured AGN candidates, because our technique requires the detection of a variable continuum component. Variability features, like the SF, have been used to characterize variable sources (e.g., Cartier et al. 2015; Sánchez et al. 2017). We then used a random forest algorithm to classify our objects as either AGN or non-AGN. We tested three classifiers, one that includes only variability features, and two that include optical colors and variability features. The main difference of our selection technique from previous variability-based AGN selection methods is the use of light curves with higher cadence and the exclusion of any morphology indicator. Hereby, we expect to detect more low-redshift AGN and LLAGN candidates than previous analyses. For some of our candidates, we have obtained optical spectra to confirm their nature. Four of the fields observed by the QUEST-La Silla AGN variability survey correspond to the LSST Deep Drilling Fields (DDFs), whose expected cadence will be similar to the nightly cadence used by the QUEST-La Silla AGN variability survey (but extending the time baseline to 10 yr).12 The QUEST-La Silla AGN variability survey is an important testbed to study AGN selection in time-domain surveys, like LSST, or the Zwicky Transient Facility (ZTF; Bellm 2014), which has a depth similar to the QUEST-La Silla AGN variability survey.

The paper is organized as follows. In Section 2, we describe the QUEST-La Silla AGN variability survey and the light-curve construction procedure. In Section 3, we describe the random forest algorithm, the variability features, and the labeled set used for the selection. We also discuss the performance of our random forest classifiers, and we comment on the selected candidates. In Section 4, we provide the results confirming the nature of some of our candidates by using public data and spectroscopic follow-up. In Section 5, we provide a comparison of our results with previous works. Finally, in Section 6, we summarize the main results.

2. Data

2.1. The QUEST-La Silla AGN Variability Survey

Between 2010 and 2015, we carried out "The QUEST-La Silla AGN variability survey" (hereafter QUEST-La Silla), using the wide-field QUEST camera mounted on the 1 m ESO-Schmidt telescope at La Silla Observatory (Cartier et al. 2015, 2016). The survey used a broadband filter, the Q band, similar to the union of the g and r SDSS filters. Our survey includes the COSMOS, ECDF-S, ELAIS-S1, XMM-LSS, and Stripe 82 fields. These are some the most intensively observed regions in the southern sky. Our QUEST fields are much larger than the nominal fields, but we will still adopt the same names, with a surveyed area of ∼14 deg2 per field, with the exception of the XMM-LSS field, which covers an area of ∼38 deg2. One of the advantages of our survey over other surveys is the intense monitoring used, observing the fields every possible night (but see the binning strategy described below). Individual images reached a limiting magnitude between r ∼ 20.5 and r ∼ 21.5 mag for an exposure time of 60 s or 180 s, respectively.

The aims of our survey are (1) to test and improve variability selection methods of AGNs, and find AGN populations missed by other optical selection techniques (Schmidt et al. 2010; Butler & Bloom 2011; Palanque-Delabrouille et al. 2011), (2) to obtain a large number of well-sampled light curves, covering timescales ranging from days to years, and (3) to study the link between the variability properties (e.g., characteristic timescales and amplitudes of variation) with physical parameters of the system (e.g., black hole mass, luminosity, and Eddington ratio).

Cartier et al. (2015) presented the technical description of the survey, the full characterization of the QUEST camera, and a study of the relation of variability to multiwavelength properties of X-ray-selected AGNs in the COSMOS field. In Sánchez-Sáez et al. (2018), we performed a statistical analysis of the connection between AGN variability and physical properties of the central SMBH, where we found that the amplitude of variability at one-year timescales (A) depends primarily on the rest-frame emission wavelength (λrest) and the Eddington ratio, where A is anticorrelated with both λrest and L/LEdd.

2.2. Light-curve Construction

We reduced the data from QUEST-La Silla using our own customized pipeline, following the same procedure described by Cartier et al. (2015), which includes dark subtraction, flat-fielding, and astrometric and photometric calibration. To calibrate the photometry, we used public photometric SDSS catalogs (Gunn et al. 1998; Doi et al. 2010) for the COSMOS, Stripe 82, and XMM-LSS fields, and public catalogs from the first year of DES (Abbott et al. 2018) for the ELAIS-S1 and ECDF-S fields. We performed aperture photometry using SExtractor (Bertin & Arnouts 1996), with the same optimal aperture found by Cartier et al. (2015) for the QUEST camera (∼6farcs18). We then constructed light curves for all sources from the SDSS and DES catalogs with detections in the QUEST-La Silla data, using the same methodology as in Cartier et al. (2015). In summary, we constructed light curves by cross-matching the SDSS and DES catalogs with every QUEST-La Silla catalog, which we generated for each observation, for which we knew their associated Julian dates, using a radius of 1''. We then constructed light curves for each source, keeping only those epochs where the SExtractor FLAG parameter was equal to zero, to prevent false detection of variability due to bad photometry. Finally, we only saved those light curves with more than three epochs. From the SDSS catalog, we could obtain single-epoch photometry of every source in the COSMOS, XMM-LSS, and Stripe 82 fields in the u, g, r, i, and z bands, and from the DES catalog, we obtained single-epoch photometry in the g, r, i, and z bands for the ELAIS-S1 and ECDF-S fields.

We decided to bin our light curves using three-day bins, in order to reduce the noise in our light curves produced by several factors, including changes in atmospheric conditions and the relatively low cosmetic quality of the QUEST CCD camera chips. This might affect the detection of variability of sources with short timescale variations, like some variable stars (e.g., RR Lyrae or Cepheid stars), however, we do not expect to detect many variable stars in the QUEST-La Silla fields (e.g., Medina et al. 2018). Moreover, in this work we are focused on the detection of sources with long timescale variations (with timescales of months or years), and thus the three-day binning does not affect our detection of AGNs.

In this work, we excluded the Stripe 82 field, because it is a crowded field and requires point-spread function (PSF) photometry. We generated a total of 277,629 light curves for sources located in the COSMOS, ECDF-S, ELAIS-S1, and XMM-LSS fields. In order to have statistically significant variability features of the sources, we decided to include in our analysis only those light curves with at least 40 epochs and a length greater than or equal to 200 days, after the three-day binning was applied (hereafter "well-sampled" light curves). There are 208,583 well-sampled light curves in the four fields. The median, mean, and standard deviation of the number of epochs of each light curve are 118, 119.3, and 47.2, respectively; and the median, mean, and standard deviation of the total length of each light curve are 1283.7, 1306.4, and 254.3, respectively. In Table 1, we summarize the total number of light curves and the number of well-sampled light curves in each field.

Table 1.  Number of Light Curves per Field

Field Total Light Curves Well-sampled Light Curves
COSMOS 68,514 45,323
XMM-LSS 104,962 82,697
Elais-S1 49,504 38,106
ECDF-S 54,649 42,457
Total 277,629 208,583

Download table as:  ASCIITypeset image

3. Selection of AGN Candidates

We implemented a supervised automatic classification using a random forest algorithm (RF; Breiman 2001) to classify our 208,583 objects with well-sampled light curves as either AGN or non-AGN according to their variability features and optical colors. We did not include a morphological parameter during the classification (e.g., SExtractor CLASS_STAR parameter), in order to be able to detect sources with AGN-like variability with extended shapes. We tested three classifiers: one that includes only variability features, and two that include optical colors and variability features. In the following subsections, we describe the selection methodology, the features used in our analysis, and the results of the classification for sources from the QUEST-La Silla survey.

3.1. Random Forests

A decision tree is a hierarchical structure that performs successive partitions on the data, each of them according to a certain criteria, such as a cutoff value in one of the descriptors or features. In this way, the data are divided into smaller and smaller subsets as the tree goes deeper, until it reaches the leaves of the tree. Each of the leaves is associated with a single class. A given class, however, may be associated with several leaves. Thus, the elements that fall on any of the leaves corresponding to a particular class will be classified as belonging to that class.

An RF algorithm consists of a collection of single decision trees, where each tree is trained using a random subset of sources, sampled with repetition, from a training set (a set of objects with known classification, selected from a labeled set), and a random selection of features. The final classification function of the algorithm weighs each of these results according to the size of the subset used by each tree and generates an average score, which can then be interpreted as the probability that the input element belongs to a certain class (predicted class probability, PRF). Then, the classifier is validated using a subset of the labeled set that was not used for training (the test set). Finally, a prediction is made on the unlabeled data. RF has several advantages—it can handle thousands of features, it provides a ranking of feature importance during the classification, it does not need to scale the feature values to the same "units," it handles numerous objects, and it is easily parallelizable.

For the selection of AGN candidates, we used the scikit-learn13 Python package implementation of RF. We performed a hyperparameter selection procedure in order to obtain the optimal values for the RF classifier by means of a K-Fold Cross-Validation procedure14 (with k = 5 folds) and using the "accuracy" (see its definition in Section 3.4) as the target score to optimize. This hyperparameter selection procedure was executed as part of the model training phase (i.e., on the training set). In this procedure, the training set is divided into k folds, using k − 1 of them to compute the RF model, and testing it in the remaining data (the validation set). This is done k times, using every time a different fold as the validation set. The parameters considered in this cross-validated search include the number of trees in the forest and the number of features to consider when looking for the best split in a tree. In order to take into account the class imbalance in the classification process, we initialized the class weight hyperparameter as "balanced_subsample."

The variability features used by the RF classifier are described in Section 3.2, and are listed in Table 2. We trained the RF classifier using a labeled set of type 1 AGNs and stars with spectroscopic classification from SDSS and with well-sampled light curves from the QUEST-La Silla survey (see Section 3.3). During the RF classifier training, we used 70% of the labeled set as a training set, and then we tested the performance of the classifier using the remaining 30% of the labeled set (the test set), as normally done during supervised learning procedures. We then applied the trained RF classifier to our unlabeled set, composed of our 208,583 sources with well-sampled QUEST-La Silla light curves, to classify them as either AGN or non-AGN. As a result, we obtain a predicted class and the predicted class probability (PRF) associated with each source of the unlabeled set.

Table 2.  List of Features

Feature Description Reference
Pvar Probability that the source is intrinsically variable McLaughlin et al. (1996)
σrms Measure of the intrinsic variability amplitude. Allevato et al. (2013)
ASF Amplitude of the variability at 1 yr, derived from the SF Schmidt et al. (2010)
γSF Logarithmic gradient of the change in magnitude, derived from the SF Schmidt et al. (2010)
Stda Standard deviation of the light curve (σLC) Nun et al. (2015)
Meanvariancea Ratio of the standard deviation to the mean magnitude (${\sigma }_{\mathrm{LC}}/\overline{m}$) Nun et al. (2015)
MedianBRPa Fraction of photometric points within amplitude/10 of the median magnitude Richards et al. (2011)
Autocor-lengtha Lag value where the autocorrelation becomes smaller than e−1 Kim et al. (2011)
StetsonKa A robust kurtosis measure Kim et al. (2011)
ηe a Ratio of the mean of the square of successive differences to the variance of data points Kim et al. (2014)
PercentAmpa Largest percentage difference between either the maximum or minimum magnitude and the median Richards et al. (2011)
Cona Number of three consecutive data points that are brighter or fainter than 2σLC Kim et al. (2011)
LinearTrenda Slope of a linear fit to the light curve Richards et al. (2011)
Beyond1Stda Percentage of points beyond one σLC from the mean Richards et al. (2011)
Q31a Difference between the third quartile and the first quartile of a light curve Kim et al. (2014)
PeriodLS Period from the Lomb–Scargle periodogram VanderPlas (2018)

Note.

aFeatures from FATS.

Download table as:  ASCIITypeset image

3.2. Variability Features

In order to have a complete description of the variability of our sources, we used several variability features. Following the same approach of Sánchez et al. (2017) and Sánchez-Sáez et al. (2018), we used two parameters related to the amplitude of the variability, Pvar, and the excess variance (σrms), and one parameter that describes the shape of the variability between two observations separated by a given time, the SF.

In particular, Pvar (see Sánchez et al. 2017 and references therein) corresponds to the probability that the source is intrinsically variable; it considers the χ2 of the light curve and calculates the probability Pvar = P(χ2) that a χ2 lower than or equal to the observed value could occur by chance for an intrinsically nonvariable source.

σrms is a measure of the intrinsic variability amplitude (see Sánchez et al. 2017 and references therein), and it is calculated as ${\sigma }_{\mathrm{rms}}^{2}=({\sigma }_{\mathrm{LC}}^{2}-{\overline{\sigma }}_{m}^{2})/{\overline{m}}^{2}$, where σLC is the standard deviation of the light curve, ${\overline{\sigma }}_{m}$ is the mean photometric error, and $\overline{m}$ is the mean magnitude.

The SF (e.g., Schmidt et al. 2010) is the average variability amplitude between two observations separated by a given time (τ), and it can be modeled as a power law: $\mathrm{SF}(\tau )\,={A}_{\mathrm{SF}}{\left(\tfrac{\tau }{1\mathrm{yr}}\right)}^{{\gamma }_{\mathrm{SF}}}$, where ASF corresponds to the amplitude of the variability at 1 yr timescales, and γSF is the logarithmic gradient of this change in magnitude.

We also used some variability features from the Feature Analysis for Time Series (Nun et al. 2015) Python package, related to the amplitude of the variability (e.g., the mean variance and the percent amplitude) and the structure of the light curve (e.g., the linear trend and the autocorrelation function length), as well as the period of the Lomb–Scargle periodogram (VanderPlas 2018), derived by using the AstroML module for Python (VanderPlas et al. 2012). A list of all the variability features used in this work is shown in Table 2, together with a brief description of each feature and its reference.

3.3. Labeled Set

To train our RF classifier, we need a labeled set, which has to be representative of the populations that we want to classify. Because in this analysis we only include extragalactic fields, we do not expect to detect a high fraction of variable stars, because of their low density at high Galactic latitudes (e.g., RR Lyrae or Cepheid stars; see Medina et al. 2018 and references therein). Moreover, because we implemented three-day binning for our light curves, the detection of variable signals with short timescales is not possible. Therefore, any variable star with a short period will have a light curve that will not be very different from a nonvariable star. Only variable stars with long periods would be detectable using our QUEST-La Silla light curves. We cross-matched the positions of the 208,583 sources with well-sampled light curves with the General Catalog of Variable Stars (Version GCVS 5.1; Samus' et al. 2017), which provides a detailed compilation of catalogs of variable stars in the Galaxy. We found that only three known variable stars are present in our data, one RR Lyrae and two cataclysmic variables. Therefore, we did not include variable stars in our RF classifiers.

In this analysis, galaxies are not included in the labeled set, because in general their variability and color properties will be similar to those of stars, unless they host an AGN (which might not have been previously detected). Therefore, we constructed a labeled set composed of stars and type 1 AGNs (i.e., AGNs with broad permitted emission lines). We decided to include only type 1 AGNs because we want to characterize properly the variability of the optical continuum emission, which cannot be detected in most type 2 or obscured AGNs.

Three of our fields (COSMOS, Stripe 82, and XMM-LSS) have spectroscopic information from SDSS. We constructed light curves for sources with spectral classification from the SDSS-DR14 database (Abolfathi et al. 2018). There are 3313 type 1 AGNs and 3332 stars with at least three epochs in the QUEST-La Silla light curves, and 2405 type 1 AGNs and 2608 stars with well-sampled light curves. We considered the sources with well-sampled light curves to define a labeled set for the RF classifier training. As mentioned in Section 3.1, 30% of the labeled set was used as a test set and 70% as a training set for the RF modeling. Figure 1 provides examples of QUEST-La Silla light curves for four sources of the labeled set.

Figure 1.

Figure 1. Example of four light curves from the QUEST-La Silla labeled set: two stars (blue dots, top panels) and two AGNs (red dots, bottom panels).

Standard image High-resolution image

It is well-known that a fraction of AGNs are misclassified in the SDSS databases; therefore, we cross-match our labeled sample with the Million Quasars Catalog (MILLIQUAS v5.7 update,15 2019 January 7, Flesch 2015; see Section 4.1 for further details), in order to estimate the fraction of AGNs in the labeled set that are not, in fact, AGNs. There are 57 AGNs in the labeled set, with well-sampled light curves, that are not present in MILLIQUAS, which correspond to the 2.4% of the AGNs in the labeled set. Of these 57 sources, 21 are classified as variable according to their variability features, and thus we can estimate that less than 2% of the AGNs in the labeled set are misclassified as AGN.

In Figure 2, we show three color–color diagrams of the labeled set: u − g versus g − r, g − r versus r − i, and r − i versus i − z. As a reference, we mark the regions of the u − g versus g − r color–color diagram dominated by a particular type of source, from Sesar et al. (2007). We can see that a high fraction of the AGNs in the labeled set are located in a region of the u − g versus g − r diagram where low-redshift, luminous AGNs (II) are the dominant population, which corresponds to 78.8% of the AGNs in the labeled set. This corresponds to the classical color–color selection of "blue" AGNs. Moreover, we can see in the figure that several high-redshift (zspec > 2.5) AGNs in the labeled sample are located in the region where high-redshift luminous AGNs are the dominant component (VI), as expected, but a non-negligible fraction is located in other regions, where binary stars or cool dwarf stars (III), RR Lyrae stars, and main-sequence stars or the "stellar locus" (V) are the dominant population.

Figure 2.

Figure 2. Color–color diagrams of the labeled set. In the left panel, we show u − g vs. g − r, in the middle panel g − r vs. r − i, and in the right panel r − i vs. i − z. The stars are represented by blue triangles, and the AGNs are represented by circles whose colors depend on the redshift of each source. The contour plots show the distribution of AGNs. In the left panel, we show with yellow dashed lines the division used in Sesar et al. (2007) to identify regions of the u − g vs. g − r diagram dominated by a particular type of source. In the middle panel, the black dashed line shows the position where g − r = 0.6.

Standard image High-resolution image

On the other hand, we can see in the right panel of Figure 2 that most of the AGNs in the labeled set are located in a region where r − i ≲ 0.7 and i − z ≲ 0.8, cleanly isolating a subpopulation of cool dwarf stars. This can be understood considering that stellar colors become monotonically redder as the effective temperature decreases (Covey et al. 2007), thus we normally observe a high concentration of cool dwarf stars in a region around g − r ∼ 1.5, with r − i ≳ 0.8. Besides, extragalactic sources (i.e., galaxies and AGNs) are normally located in regions of color–color space with r − i ≲ 1.0 (e.g., Rahman et al. 2016), because their integrated emission typically has a low contribution from cool dwarf stars. Therefore, we can use r − i and r − z colors to separate AGNs from cool dwarf stars. The separation of AGNs and stars from the general "stellar locus" is more complicated if we only use optical colors, because there is a high overlap between these two populations in the different color–color diagrams, particularly in the u − g versus g − r diagram, as already discussed. Thus, including variability information in the selection of AGN candidates will be extremely useful to improve AGN selection.

3.4. Performance of the Random Forest Classifier

We tested three different RF classifiers. The first one includes only variability features (hereafter RF1), the second one includes variability features and the r − i and i − z colors (hereafter RF2), and the third classifier includes variability features and the g − r, r − i, and i − z optical colors (hereafter RF3). We exclude u − g because we do not have photometry in the u band for the Elais-S1 and ECDF-S fields, and because our labeled set does not cover properly the u − g space, compared to the unlabeled set; thus, including it might produce poor results. For this reason, we did not test a pure color selection, as the u band is highly discriminating for selecting AGNs (e.g., Richards et al. 2002, 2009; Ross et al. 2012).

As can be seen, the difference between RF2 and RF3 is the exclusion of the color g − r in RF2. As mentioned in the previous section, the r − i and i − z colors can easily separate cool dwarf stars and AGNs. However, separating AGNs and stars from the stellar locus is difficult when we use optical colors, particularly for the case of u − g and g − r. Thus, with RF2, we can test whether avoiding the use of g − r can improve the detection of redder AGN populations. On the other hand, RF1 excludes optical colors, and thus the amount of information used by this classifier is lower compared to RF2 and RF3. Optical colors have been exhaustively used in the literature for the selection of AGN candidates (e.g., Fan 1999; Richards et al. 2002, 2004, 2009; Smith et al. 2005; Bovy et al. 2011; Kirkpatrick et al. 2011; Ross et al. 2012), and thus with RF1, we can test whether single-band variability-based techniques can provide results as competitive as the ones obtained using optical colors.

As mentioned in Section 3.1, we trained each classifier using 70% of the labeled set as a training set and the remaining 30% of the labeled set as a test set. The selection of the training and test sets is done randomly, using the "train_test_split" procedure of scikit-learn. The labeled set, by definition, has the same limiting magnitude as the QUEST-La Silla images (r ∼ 21); therefore, the training and test sets have limiting magnitudes of r ∼ 21.

As we are interested in selecting only AGN candidates, for the rest of the analysis, we will refer to stars, and any source that is not an AGN, as non-AGN.

3.4.1. RF1: Selection of AGNs Based Solely on Variability

Our first RF classifier (RF1) includes only variability features. We show the results from this classifier using a confusion matrix, which is shown in Figure 3 (see its RF1 results). It can be seen that AGNs (true positives) are in general well classified, and also that the fraction of non-AGNs classified as AGNs (false positives) is very low.

Figure 3.

Figure 3. Confusion matrix from testing the RF1, RF2, and RF3 in the test set. "True Label" represents the classification done from SDSS spectra, and "Predicted Label" is the outcome of each classifier.

Standard image High-resolution image

We also computed the following scores to assess our classifiers: accuracy (A), precision (P), recall (R), and F1. These scores are defined by means of the True Positives (TPs; known AGNs classified as AGNs by the RF classifier), the False Positives (FPs; known non-AGNs classified as AGNs), the True Negatives (TNs; known non-AGNs classified as non-AGNs), and the False Negatives (FNs; known AGNs classified as non-AGNs):

Equation (1)

Table 3 shows the computed scores for the RF1 classifier. From these scores, and from the confusion matrix, we can say that RF1 presents a low fraction of FPs; thus, the sample of predicted AGNs has low contamination from non-AGNs. However, we tend to miss a fraction of real AGNs (∼10%). This results from the difficulty of detecting a variable signal from AGNs with low amplitude variability, and because we are only considering variability properties for the classification, they could be classified as non-AGNs.

Table 3.  Scores Measured in the Test Set for Each Classifier

Score RF1 RF2 RF3
Accuracy 0.916 0.923 0.931
Precision 0.909 0.909 0.921
Recall 0.933 0.950 0.951
F1 0.921 0.930 0.936

Download table as:  ASCIITypeset image

It is important to consider that we are testing the RF1 classifier in a sample of AGNs selected mostly by means of their optical colors, and because we are only considering variability features in our selection, the confusion matrix and the different scores, obtained from our labeled sample, might not necessarily be an optimal prediction of the performance of our method in the unlabeled sample.

One of the advantages of the RF classification is that we can easily know the feature importance, because it provides a ranking score for each feature, or how well every feature separates the two classes. In the first columns of Table 4, we provide the list of features, ordered by importance (rank value), for the RF1 classifier. It can be seen that the four most important features are the amplitude of the SF, the excess variance, the Meanvariance, and Q31. In Figure 4, we show the distribution of the ASF and Q31 features for the labeled set. We highlight using black dots those AGNs classified as variable, according to the definition proposed by Sánchez et al. (2017), where a source is classified as variable when its light curve satisfies Pvar ≥ 0.95 and (${\sigma }_{\mathrm{rms}}^{2}$ − err(${\sigma }_{\mathrm{rms}}^{2}$)) > 0. From the figure, it can be seen that AGNs and non-AGNs are separated by these two features, with ASF providing a much stronger division than Q31, as there is substantial source overlap between the two classes with the latter indicator. It can be also seen that the majority of AGNs with low variability amplitude are classified as nonvariable.

Figure 4.

Figure 4. Distribution of the ASF and Q31 features for the labeled set. Blue triangles correspond to non-AGNs, and red circles correspond to AGNs. We mark with black dots those AGNs classified as variable, according to the definition used in Sánchez et al. (2017).

Standard image High-resolution image

Table 4.  Feature Importance for Each Classifier

RF1 RF2 RF3
Feature Rank Feature Rank Feature Rank
ASF 0.197 ASF 0.209 ASF 0.189
σrms 0.139 σrms 0.149 σrms 0.142
Meanvariance 0.127 Q31 0.102 Q31 0.113
Q31 0.111 Pvar 0.093 Meanvariance 0.099
Pvar 0.095 Std 0.088 Pvar 0.095
Std 0.090 Meanvariance 0.086 Std 0.074
PercentAmp 0.040 PercentAmp 0.045 Autocor-length 0.042
γSF 0.036 Autocor-length 0.035 PercentAmp 0.039
Autocor-length 0.033 γSF 0.031 g − r 0.031
MedianBRP 0.025 r − i 0.028 γSF 0.029
LinearTrend 0.023 MedianBRP 0.021 r − i 0.023
PeriodLS 0.023 PeriodLS 0.020 MedianBRP 0.020
ηe 0.023 ηe 0.019 ηe 0.020
Beyond1Std 0.019 Beyond1Std 0.019 i − z 0.019
StetsonK 0.018 LinearTrend 0.019 LinearTrend 0.018
Con 0.002 i − z 0.019 PeriodLS 0.017
    StetsonK 0.014 Beyond1Std 0.017
    Con 0.002 StetsonK 0.012
        Con 0.002

Download table as:  ASCIITypeset image

3.4.2. RF2: Selection of AGNs Based on Variability and r − i, and i − z Optical Colors

Our second RF classifier (RF2) includes variability features and the r − i and i − z colors. Figure 3 shows the confusion matrix for RF2 (see its RF2 results). In this case, the confusion matrix is similar to the confusion matrix of RF1; however, in the case of RF2, we have a slightly cleaner population of AGN candidates (i.e., the fraction of FPs is lower). The accuracy, precision, recall, and F1 scores are given in Table 3. There are no significant differences between the score values of RF1 and RF2.

In Table 4, we list the ranking of features for the RF2 classifier. There are no significant differences when compared to RF1, and notably, we found that variability features are more relevant for AGN selection than the r − i and i − z colors. In this case, the most important features are ASF, σrms, Q31, and Pvar. We also found that the r − i color seems to be more relevant than the i − z color for classifying our sources.

3.4.3. RF3: Selection of AGNs Based on Variability and g − r, r − i, and i − z Optical Colors

Our third RF classifier (RF3) includes variability features and the g − r, r − i, and i − z colors. Figure 3 shows the confusion matrix for RF3 (see its RF3 results). In this case, the fraction of TPs (true AGNs classified as AGNs) is slightly higher than that for RF1 and RF2. However, we must consider that most of the AGNs in the labeled set have been selected by means of their optical colors, which might explain the improvement of the results over the test set compared with the RF2 classifier.

The scores for RF3 are listed in Table 3. In comparison to RF1 and RF2, the scores are slightly higher, particularly the precision. In Table 4, we list the ranking of features for the RF3 classifier. In this case, the most important features are ASF, σrms, Q31, and the Meanvariance. The most important color is g − r, which is expected, due to the distribution of non-AGNs and AGNs in Figure 2. It is well known that much of the discriminating power for selecting unresolved AGNs is in u − g (e.g., Braccesi et al. 1970), which is in agreement with our finding that r − i and i − z colors are not as relevant for AGN selection.

3.5. AGN Candidates from QUEST-La Silla

We applied the trained RF1, RF2, and RF3 classifiers to our unlabeled well-sampled set of 208,583 light curves. In order to improve the purity of our selection, we considered the predicted class probability PRF (computed as the mean predicted class probabilities of the trees in the forest) to select the final set of AGN candidates. We defined two samples of AGN candidates: (a) the full-AGN sample, consisting of all sources classified as AGNs by the RF classifier (PRF ≥ 0.5), and (b) the hp-AGN sample, consisting of sources that have a high probability (PRF ≥ 0.8) of being an AGN based on the RF classifier. In Table 5, we provide a summary with the number of sources classified as AGNs in both samples, for each classifier. For the case of the RF1 classifier, there are 17,120 sources in the full-AGN sample, and 5941 sources in the hp-AGN sample. For the RF2 classifier, there are 15,100 sources in the full-AGN sample, and 5252 sources in the hp-AGN sample. Finally, for RF3, there are 13,810 sources in the full-AGN sample, and 4482 sources in the hp-AGN sample. There are 4054 candidates in common among the RF1, RF2, and RF3 hp-AGN samples. For the rest of the analysis, we will only consider the hp-AGN samples of each classifier.

Table 5.  Number of AGN Candidates per Field, for Each Classifier

  RF1 RF2 RF3
Field full-AGN hp-AGN full-AGN hp-AGN full-AGN hp-AGN
COSMOS 3968 1503 3562 1201 3424 1018
XMM-LSS 6441 2374 5774 2106 5516 1879
Elais-S1 3374 988 2936 942 2441 777
ECDF-S 3337 1076 2828 1003 2429 808
Total 17,120 5941 15,100 5252 13,810 4482

Download table as:  ASCIITypeset image

Figure 5 shows the g − r versus r − i color–color diagram of the unlabeled set, and the hp-AGN samples for the RF1, RF2, and RF3. Comparing with Figure 2, we can see that several of our AGN candidates are located in regions of color–color space where AGNs are not normally found, particularly for the case of RF1. The main difference between the candidates of RF1 and the rest of the classifiers is the exclusion of sources in the color–color region where we normally find cool stars. For example, there are 4890 candidates in common between RF1 and RF2, and 1051 RF1 candidates that are not candidates for RF2. Regarding the former ones, 54.1% have r − i > 0.7, where we expect to find mostly cool stars. The main difference between the candidates of RF2 and RF3 is the exclusion of redder candidates in the case of RF3 (g − r ≳ 1.0). There are 4178 candidates in common between RF2 and RF3, and 1074 candidates in RF2 that are not candidates for RF3, with 70.5% of these having g − r > 1.0.

Figure 5.

Figure 5. g − r vs. r − i color–color diagrams of the unlabeled set (blue circles), and the hp-AGN sample (red stars) for the RF1 (top panel), RF2 (middle panel), and RF3 (bottom panel). The contour plots show the distribution of the hp-AGN samples for each classifier. In the top panel, we show the position of candidates observed during spectroscopic follow-up, differentiating between type 1 AGNs (AGN1; yellow squares), BAL–QSOs (BAL; red triangles), stars (cyan circles), and galaxies (Gal; black triangles). In the middle and bottom panels, we show with letters and black squares the position of a selection of observed candidates located in the stellar locus.

Standard image High-resolution image

In the top panel of Figure 5, we also show a selection of AGN candidates observed during spectroscopic follow-up (see Section 4.2). In the middle and bottom panels of the figure, we show the position in the g − r versus r − i diagram of four candidates located in different positions of the stellar locus, marked with letters (ABCD), and in Figure 6, we show their light curves, where it can be seen that they are clearly variable. These four sources are selected as candidates by RF1, two are selected as candidates by RF2 (A and B), and none of them is selected as a candidate by RF3.

Figure 6.

Figure 6. Light curves of some RF1 candidates located in the stellar locus, observed during the spectroscopic follow-up campaign, shown in the top-right and bottom-left panels of Figure 5. A and B are classified as type 1 AGNs, and C and D are classified as M-type stars.

Standard image High-resolution image

4. Confirmation of AGN Candidates

In the following subsections, we aim to confirm the nature of our candidates. In Section 4.1, we use ancillary data to confirm the nature of our AGN candidates. In Section 4.2, we show the results of our spectroscopic follow-up campaign, conducted between 2016 December and 2018 September, to test the efficiency of our selection method.

We are particularly interested in identifying the nature of sources located in positions of the color–color space dominated by stars (e.g., in the stellar locus). We divided our high-probability candidates according to their g − r colors; we avoid u − g because the u band is not available for all fields. We define the blue subsample as the one composed of sources with g − r ≤ 0.6 and the red subsample as that composed of sources with g − r > 0.6. As can be seen in Figure 2, most of the AGNs in the labeled set have g − r ≤ 0.6.

4.1. Confirmation by Ancillary Data

MILLIQUAS (v5.7 update, 2019 January 7; Flesch 2015) provides a very complete compendium of known AGNs (both type 1 and type 2) from the literature, including the last data release of SDSS (SDSS-DR15), and several recent XMM-Newton, Swift, and Chandra catalogs (e.g., Evans et al. 2014; Marchesi et al. 2016; Maitra et al. 2019). It also includes a list of high-confidence AGN candidates from different sources like AllWISE (Secrest et al. 2015).

We used MILLIQUAS to confirm the nature of our candidates. We cross-matched MILLIQUAS with the coordinates of our well-sampled light curves, using a radius of 1''. There are 3524 sources in the well-sampled sample with identifications in MILLIQUAS. For the case of the RF1 classifier, there are 2358 (66.9%) of these sources in the hp-AGN sample, 2757 (78.2%) in the full-AGN sample, and 767 (21.8%) sources classified as non-AGNs. For the RF2 classifier, there are 2366 (67.1%) sources in the hp-AGN sample, 2775 (78.7%) in the full-AGN sample, and 749 (21.3%) sources classified as non-AGNs. Finally, for the RF3 classifier, there are 2348 (66.7%) sources in the hp-AGN sample, 2769 sources in the full-AGN sample (78.6%), and 755 (21.4%) sources classified as non-AGNs. From these results, we can say that ∼21% of the sources are misclassified as non-AGNs when we include variability features in the selection. Thus, we can conclude that when we include variability features in our selection, we obtain a completeness of ∼79%.

We plot in the top panel of Figure 7 the distribution of the ASF variability feature for sources in MILLIQUAS and QUEST-La Silla belonging to the RF1 hp-AGN sample, the RF1 full-ANG sample, and sources classified as non-AGNs by RF1 (but classified as AGNs by MILLIQUAS). It can be seen that the main difference between the sources classified as AGNs and non-AGNs is the value of the variability amplitude at one year, i.e., we are not detecting variability for the sources classified as non-AGNs.

Figure 7.

Figure 7. Top panel: normalized distribution of ASF for sources with detection in MILLIQUAS with well-sampled light curves in QUEST-La Silla. We show sources from the RF1 full-AGN sample (blue) and the RF1 hp-AGN sample (red), and sources classified as non-AGN by RF1. Middle panel: normalized distribution of the mean Q magnitude for sources from the hp-AGN sample that are present (blue) and not present (red) in MILLIQUAS. Bottom panel: normalized distribution of ASF for sources from the hp-AGN sample that are present (blue) and not present (red) in MILLIQUAS.

Standard image High-resolution image

In the middle panel of Figure 7, we compare the distribution of the mean Q magnitude (determined from their light curves) of sources from the RF1 hp-AGN sample that have and do not have detection in MILLIQUAS, and in the bottom panel, we compare their ASF distributions. It can be seen that the sources with detection in MILLIQUAS are in general brighter than the sources without detection in MILLIQUAS; however, the amplitude of the variability is lower for the sources without detection in MILLIQUAS. It is important to note that only a small fraction (≲3%) of the AGNs in MILLIQUAS are classified as host dominated (i.e., they appear extended in imaging). This demonstrates that variability can be used to augment AGN selection to include objects that are extended as well as point sources.

In Table 6, we show the number of hp-AGN candidates confirmed using MILLIQUAS, dividing the hp-AGN samples into red and blue subsamples, as already described. It can be seen that most of the confirmed sources are in the blue subsample, with 52% of the candidates in this sample confirmed for RF1, 52.4% for RF2, and 52.1% for RF3. For the case of the red subsample, 5.3% of the candidates from RF1 are confirmed using MILLIQUAS, 7.1% for RF2, and 12% for RF3. Further, there are 354 AGNs listed as candidates in MILLIQUAS for RF1, 356 for RF2, and 345 for RF3. Most of them are candidates from WISE (Secrest et al. 2015). This lack of confirmed red candidates can be understood if we consider that most of the AGNs presented in MILLIQUAS come from samples that applied morphological cuts to target point sources, and thus, they tend to exclude sources whose emission is dominated by their host galaxies.

Table 6.  Number of hp-AGN Candidates Confirmed Using MILLIQUAS, for Each Classifier

  RF1 RF2 RF3
Sample blue red blue red blue red
MILLIQUAS AGN 1882 122 1894 116 1901 100
MILLIQUAS candidate 324 30 322 34 320 25
X-ray detections 640 59 644 57 641 47
hp-AGN 3618 2323 3613 1639 3646 836

Download table as:  ASCIITypeset image

4.1.1. Candidates with X-Ray Detections

MILLIQUAS provides X-ray detections associated with every source; however, some recent catalogs like Luo et al. (2017) and Chen et al. (2018) are not completely included. Thus, we used different X-ray catalogs to complement the information provided by MILLIQUAS and see which candidates have X-ray detections associated. For the COSMOS field, we used the optical and infrared counterparts catalog of the Chandra COSMOS-Legacy Survey (Marchesi et al. 2016); for the XMM-LSS field, we used the recent XMM-SERVS survey catalog (Chen et al. 2018); for the ECDF-S field, we used the Chandra Deep Field-South 7 Ms source catalog (Luo et al. 2017); and for the Elais-S1 field, we used the Elais-S1 field X-ray source optical/IR Identifications catalog (Feruglio et al. 2008).

In Table 6, we provide the number of candidates with X-ray detections from the previously mentioned X-ray catalogs or MILLIQUAS (see the row "X-ray detections"). It can be seen that most of the candidates with X-ray detections are from the blue subsample.

4.2. Spectroscopic Follow-up of AGN Candidates

Because most of the AGNs confirmed with ancillary data have blue colors (g − r ≤ 0.6), we performed spectroscopic follow-up to confirm the nature of sources located in different regions of the color–color space. We used the Goodman, at SOAR (Clemens et al. 2004), and EFOSC2, at the New Technology Telescope (NTT; Buzzoni et al. 1984), instruments to observe 54 candidates (for details, see Appendix B).

To select the candidates for follow-up campaign, we divided the hp-AGN sample of the RF1 classifier into blue and red subsamples. Then, we randomly selected 100 candidates from each subsample, excluding sources with r > 20.5 for which it would be hard to obtain a good-quality spectrum with 4 m class telescopes. We visually inspected the light curves of the selected candidates in order to exclude sources with evidence of bad photometry (produced by the relatively low cosmetic quality of the QUEST CCD camera chips). During the follow-up campaign, we observed as much of the selected candidates as we could, observing in total 54 targets. We gave priority to sources from the red subsample. Of the 54 candidates observed, 38 have g − r > 0.6, which represents 70% of the sample.

In the top panel of Figure 8, we show the r-band magnitude distribution of hp-AGN candidates of the RF1 classifier and the observed candidates. We can see that the distributions are different. This is produced by the limitations of observing faint targets with 4 m class telescopes. During the follow-up campaign, we gave priority to sources with r < 20, for which we expected to obtain spectra with signal to noise higher than 10. In the bottom panel of Figure 8, we show the distribution of the predicted class probability (PRF) for the hp-AGN candidates of the RF1 classifier and the observed candidates. It can be seen that in general, we observed sources with higher probabilities, compared with the total sample of candidates. This is produced by visual inspection of the light curves and the selection of brighter sources for the follow-up campaign.

Figure 8.

Figure 8. Top panel: normalized histogram of the r-band magnitude of the RF1 hp-AGN sample (red) and the observed candidates (blue). Middle panel: normalized histogram of the predicted class probability (PRF) of the RF1 hp-AGN sample (red) and the observed candidates (blue). Bottom panel: normalized histogram of the spectroscopic redshift of AGNs from the labeled sample (green) and observed candidates classified as type 1 AGNs or BAL–QSOs (red).

Standard image High-resolution image

We used the spectra to classify our targets and to estimate their redshifts. For details about the spectroscopic analysis, see Appendix B. The full list of observed candidates can be found in Appendix A We provide the position of the observed sources, their redshift, their g − r and r − i colors, their r magnitude, and their spectroscopic classification. In Table 7, we provide a summary of the follow-up campaign. We divided the classified sources into blue and red subsamples, and we also separate them according to their spectroscopic classes: AGN1 (type 1 AGNs), BAL–QSO, galaxy, and star.

Table 7.  Summary of the Spectroscopic Follow-up Campaign

Class Blue subsample Red subsample Total
AGN1 15 25 40
BAL–QSO 1 3 4
Galaxy 0 5 5
Star 0 5 5

Download table as:  ASCIITypeset image

In the top-left panel of Figure 5, we show the color–color diagram of the observed candidates, with colors and shapes depending on their spectral classification. In the top-right and bottom-left panels of Figure 5, we mark with letters some candidates located in the stellar locus. Sources A and B are classified as type 1 AGNs, and sources C and D are classified as type M stars. From the light curves of sources C and D (see Figure 6) and from their spectra, we propose that these candidates are irregular variable stars.

In the bottom panel of Figure 8, we show the normalized redshift distribution of AGNs from the labeled sample and observed candidates classified as AGNs or BAL–QSOs. We can see that we have a much larger fraction of low-redshift sources observed during the follow-up campaign. Further, the fraction of observed AGNs with z > 3.0 is slightly higher compared with the AGNs from the labeled sample.

There are seven targets with redshift higher than 2.5. These types of AGNs are harder to detect than lower-redshift AGNs in magnitude-limited, optical color–color selections (because their colors resemble those of stars, particularly near the magnitude limit of surveys where the stellar locus is wider), and clearly benefit from the variability criteria (Butler & Bloom 2011; Palanque-Delabrouille et al. 2011, 2016). In addition, we found four BAL–QSOs, with three having g − r > 0.6. There are 22 AGNs with zspec < 0.7, of which 20 have g − r > 0.6, and eight of these are LLAGNs, whose continua are significantly dominated by the host galaxy, but with clearly distinguishable broad emission lines.

The case of the eight LLAGNs is particularly interesting, because the continuum of their spectra is dominated by the host galaxy, but we still detect its optical variable component, which is associated with the accretion disk. This component is revealed by subtracting the galactic continuum of each spectrum following the simple procedure of Greene & Ho (2005) and Kim et al. (2006). As an example, in Figure 9 we show the spectra of two LLAGN sources—in red we show the original observed spectra, and in blue we show the AGN component. It can be seen that in both cases the continuum is dominated by the host galaxy, but after its subtraction, the power-law AGN continuum appears, which is the one that produces the optical variations. It is important to remark that without including variability features in the selection of our candidates, these types of sources would be classified as non-AGNs according to their optical colors. This reflects the importance of including variability in the selection of LLAGNs.

Figure 9.

Figure 9. Rest-frame optical spectra of two type 1 AGNs with evidence of continuum dominated by the host galaxy (LLAGNs), observed with EFOSC2/NTT. In red, we show the original spectra, and in blue, we show the AGN component. The most prominent emission lines correspond to Hα.

Standard image High-resolution image

We define the efficiency of a classifier as the number of confirmed AGNs divided by the total number of observed candidates. Considering all the observed candidates, the efficiency is 100% for the blue subsample and 73.7% for the red subsample. In Table 8, we show from which classifier comes every observed candidate. As we mentioned previously, the 54 observed candidates belong to the RF1 hp-AGN sample; therefore, the efficiency of the follow-up for the RF1 classifier is 100% for the blue subsample and 73.7% for the red subsample. For the case of RF2, there are 50 observed candidates, with an efficiency of 100% for the blue subsample and 79.4% for the red subsample. There are three stars and one type 1 AGN excluded by RF2. For the case of RF3, there are 43 observed candidates. The efficiency of the RF3 blue subsample is 100%, and for the red subsample it is 85.2%. There are four type 1 AGNs, one BAL–QSO, three galaxies, and three stars excluded by RF3.

Table 8.  Targets Observed during Spectroscopic Follow-up

Name R.A. Decl. Telescope Classifier zspec FLAGz r g − r r − i Class
QLS_1 7.021016 −45.806145 SOAR RF1/RF2 3.5852 1 20.14 1.13 0.31 AGN1
QLS_2 7.263506 −45.629417 NTT RF1/RF2/RF3 1.3796 2 20.72 0.63 0.09 AGN1
QLS_3 7.323368 −43.633305 SOAR RF1/RF2/RF3 0.3242 1 18.57 0.03 −0.08 AGN1
QLS_4 7.377026 −46.529945 NTT RF1/RF2 0.2824 1 19.32 1.53 0.51 Gal
QLS_5 7.387083 −43.664276 NTT RF1/RF2 0.2000 1 19.16 1.08 0.39 Gal
QLS_6 7.418616 −42.320072 NTT RF1 0.0 2 17.88 1.52 1.24 STAR
QLS_7 7.419530 −43.789948 NTT RF1/RF2 0.3912 2 18.91 1.12 0.34 AGN1
QLS_8 7.820146 −45.645706 NTT RF1/RF2 0.3123 2 19.07 1.41 0.49 AGN1
QLS_9 7.970508 −42.275764 NTT RF1/RF2/RF3 0.1847 2 18.02 0.99 0.41 AGN1
QLS_10 8.304768 −45.681595 NTT RF1/RF2/RF3 0.2676 2 18.73 0.92 0.43 AGN1
QLS_11 8.348364 −46.856922 NTT RF1/RF2/RF3 3.5297 2 19.90 0.75 0.08 AGN1
QLS_12 8.629325 −42.310108 SOAR RF1 0.0 2 20.36 1.38 1.26 STAR
QLS_13 8.787973 −45.348194 NTT RF1/RF2/RF3 0.1469 2 17.53 0.64 0.38 AGN1
QLS_14 9.407302 −43.000004 NTT RF1/RF2/RF3 1.986 1 19.69 0.69 0.17 AGN1
QLS_15 9.414984 −43.422619 SOAR RF1/RF2/RF3 1.8265 2 20.45 −0.06 0.29 AGN1
QLS_16 10.021671 −43.859173 NTT RF1/RF2/RF3 0.3710 2 19.28 0.86 0.32 AGN1
QLS_17 10.097470 −44.866116 NTT RF1/RF2/RF3 2.9609 1 19.94 0.70 0.38 BAL–QSO
QLS_18 10.225745 −43.855934 NTT RF1/RF2/RF3 1.2671 1 20.49 0.63 −0.07 AGN1
QLS_19 10.758265 −42.452019 NTT RF1/RF2/RF3 0.1000 1 19.90 0.89 0.24 Gal
QLS_20 10.866718 −43.825359 NTT RF1/RF2/RF3 3.123 1 20.58 0.65 0.11 AGN1
QLS_21 11.090293 −43.665966 NTT RF1/RF2/RF3 0.0 2 17.12 0.78 0.24 STAR
QLS_22 30.591522 −2.020991 SOAR RF1/RF2/RF3 2.0502 2 18.66 0.04 0.11 AGN1
QLS_23 30.603312 −1.752825 NTT RF1/RF2/RF3 0.2101 2 18.12 0.71 0.33 AGN1
QLS_24 31.057844 −2.953287 SOAR RF1/RF2/RF3 0.1500 1 19.26 0.91 0.36 Gal
QLS_25 31.167528 −3.630512 NTT RF1/RF2 1.22 1 19.14 0.61 0.06 BAL–QSO
QLS_26 31.533081 −2.510754 SOAR RF1/RF2/RF3 1.4319 1 18.12 0.09 0.06 AGN1
QLS_27 32.158398 −3.652800 NTT RF1 0.0 2 18.02 1.45 1.67 STAR
QLS_28 32.456287 −3.651828 SOAR RF1/RF2/RF3 1.4976 2 18.85 0.09 0.14 AGN1
QLS_29 33.409081 −3.250347 SOAR RF1/RF2/RF3 2.8491 2 19.54 0.16 0.06 AGN1
QLS_30 33.474072 −3.279924 NTT RF1/RF3 0.5671 1 20.81 0.88 0.80 AGN1
QLS_31 36.426113 −2.971209 NTT RF1/RF2/RF3 0.0 2 16.53 0.67 0.22 STAR
QLS_32 36.852539 −2.401858 NTT RF1/RF2/RF3 0.2551 2 19.94 0.84 0.38 AGN1
QLS_33 37.988422 −2.585521 SOAR RF1/RF2/RF3 0.3498 2 18.83 0.09 0.01 AGN1
QLS_34 38.680885 −2.637419 SOAR RF1/RF2/RF3 0.8437 2 19.86 0.46 0.06 AGN1
QLS_35 51.529488 −29.656691 NTT RF1/RF2/RF3 0.2307 2 18.15 0.87 0.43 AGN1
QLS_36 51.765114 −27.740358 SOAR RF1/RF2/RF3 2.0279 2 19.20 0.07 0.17 AGN1
QLS_37 52.009411 −28.600405 NTT RF1/RF2 0.2667 2 18.01 1.33 0.49 Gal
QLS_38 52.013538 −30.619936 NTT RF1/RF2/RF3 0.3284 2 19.47 0.90 0.37 AGN1
QLS_39 52.397503 −27.657492 NTT RF1/RF2/RF3 1.4175 2 19.59 0.60 0.20 AGN1
QLS_40 52.536362 −29.822481 NTT RF1/RF2/RF3 0.1804 2 17.51 0.66 0.53 AGN1
QLS_41 52.593563 −29.353525 NTT RF1/RF2/RF3 0.2177 2 19.94 0.61 0.41 AGN1
QLS_42 53.317341 −29.488207 SOAR RF1/RF2/RF3 1.9234 2 19.48 0.16 0.22 AGN1
QLS_43 53.730831 −27.736212 NTT RF1/RF2/RF3 3.4986 1 19.37 0.86 0.27 BAL–QSO
QLS_44 53.749317 −27.499168 NTT RF1/RF2 0.3598 2 19.42 1.13 0.40 AGN1
QLS_45 53.847992 −28.123224 SOAR RF1/RF2/RF3 0.8682 2 17.11 0.09 −0.06 AGN1
QLS_46 53.864979 −26.950115 SOAR RF1/RF2/RF3 0.8159 2 19.50 0.04 0.08 AGN1
QLS_47 53.911106 −28.961195 NTT RF1/RF2/RF3 0.2116 2 18.54 0.72 0.46 AGN1
QLS_48 53.943211 −27.432163 NTT RF1/RF2/RF3 0.4332 1 19.18 1.02 0.32 AGN1
QLS_49 54.049484 −28.095388 SOAR RF1/RF2/RF3 2.5265 2 18.87 0.10 −0.04 AGN1
QLS_50 54.782047 −26.617104 SOAR RF1/RF2/RF3 0.3825 2 19.14 0.70 0.26 AGN1
QLS_51 54.785744 −28.131121 SOAR RF1/RF2/RF3 2.1388 1 19.20 0.34 0.09 BAL–QSO
QLS_52 54.878654 −27.307127 SOAR RF1/RF2/RF3 1.0899 2 19.72 0.62 0.16 AGN1
QLS_53 55.059608 −26.485237 SOAR RF1/RF2/RF3 1.6453 2 19.43 0.05 0.17 AGN1
QLS_54 149.004532 1.161204 SOAR RF1/RF2/RF3 2.3530 1 20.36 0.33 0.09 AGN1

Download table as:  ASCIITypeset image

From these results, we can conclude that RF2 has a higher efficiency compared to RF1 and RF3, because it has a high efficiency for both the blue and red subsamples, and excludes most of the observed stars and only one type 1 AGN. RF3 also provides good results; however, it excluded one BAL–QSO and four type 1 AGNs, one of which has zspec = 3.5852 and two are LLAGNs. In Appendix C, we provide the list of hp-AGN candidates from the RF2 classifier.

4.2.1. Non-AGN Observed Sources

We observed 10 sources that were spectroscopically classified as stars or galaxies, but as AGNs by our classifiers; all have clear evidence of variability. These 10 sources are selected as candidates by RF1, seven are selected as candidates by RF2 (two star and five galaxies), and four are selected as candidates by RF3 (two stars and two galaxies).

For the galaxy-classified cases, the light curves show clear signs of AGN-like variability. The obtained spectra of these sources are generally noisy (signal to noise less than 10), so we could be missing some weak emission lines. We tried to subtract a galactic component from these sources, and in some cases we found evidence of a power-law continuum component, but without evidence of emission lines. We decided to classify these sources as galaxies; however, in order to confirm their true nature, we likely need deeper observations, using 8 m class telescopes. We note that Cartier et al. (2015) found that about 20% of the objects classified spectroscopically as galaxies showed variability; they found a similar percentage of narrow-line AGNs showing variability.

For the star-classified cases, four of them are M-type stars and one seems to be a K-type star. From their light curves, we can conclude that they are semiperiodic or irregular variable stars.

5. Comparison with Previous Works

Butler & Bloom (2011) used SDSS photometry to select AGN candidates through variability in the Stripe 82 field. They used damp random walk modeling (Kelly et al. 2009) to detect quasar-like variable sources. The light curves used in their analysis have on average 10 epochs, with a maximum of 28, obtained over ∼6 yr (Sesar et al. 2007). As can be seen in Figure 8 of Butler & Bloom (2011), most of their candidates lie in the region where typical AGNs are found, and only ∼1% of their candidates lie in the color–color space dominated by stars (stellar locus).

Palanque-Delabrouille et al. (2011, 2016) used SDSS photometry to select AGN candidates through variability in the Stripe 82 field, to be observed as part of the BOSS and eBOSS surveys. On average, the light curves they used had 53 ± 20 epochs, and a total coverage spanning 4 and 10 yr. They characterized the variability of each source using the SF, and they classified the sources using a neural network algorithm. They demonstrated that their method is very efficient at selecting sources with zspec > 2.2, or a BAL–QSO classification. However, the fraction of candidates with stellar-like colors is low (as can be seen in Figure 18 of Palanque-Delabrouille et al. 2011).

Peters et al. (2015) used SDSS data to perform an NBC KDE algorithm that classifies type 1 quasars, using color, variability and astrometric parameters. They used data in five broad optical bands (u, g, r, i, and z), and constructed light curves in each band with 10 to ∼100 observations over timescales from ∼1 day to ∼8 yr. They used SFs to characterize the variability of each source. They tested different combinations of these parameters, finding that by combining variability and colors, they can achieve 97% efficiency, improving particularly the efficiency in the selection of quasars at 2.7 < z < 3.5. They selected 35,820 type 1 quasar candidates, with only the 14% of them having g − r > 0.6.

More recently, Tie et al. (2017) combined colors and variability properties to select AGN candidates from DES. They obtained light curves from DES, which span less than a year and typically have ∼15 epochs. They used the chi-squared integrated probability to select AGN candidates. Because they have light curves with only a year of coverage, they did not implement more sophisticated variability selection methods. They demonstrated that combining variability with optical and infrared photometry improves the efficiency of AGN selection. Tie et al. (2017) provided a catalog of 1263 spectroscopically confirmed quasars in the DES supernova fields brighter than i = 22 mag. Only 6% of their confirmed candidates have g − r > 0.6.

The light curves used in our analysis have a considerably higher cadence than the ones used in previous variability analyses, with an average of 119 ± 47 epochs and a total length of 1306 ± 254 days. In our case, we find that 39.1%, 31.2%, and 18.7% of the hp-AGN candidates from RF1, RF2, and RF3, respectively, have g − r > 0.6, where stars outnumber AGNs. As we show in Section 4.2, most of the atypical AGNs observed during our spectroscopic follow-up campaign lies in this region of color–color space (70% of the observed candidates). In general, our selection technique does not differ considerably from other techniques (e.g., Palanque-Delabrouille et al. 2011; Peters et al. 2015). The key difference is the larger number of epochs and the larger coverage of the QUEST-La Silla light curves, in addition to the exclusion of morphological parameters during the selection of candidates. This helps us to be more sensitive to atypical AGNs, like BAL–QSO and LLAGN (observing four and eight, respectively, during the follow-up) compared with previous analyses, which might be related to the higher probability of detecting a variable signal from our better quality light curves.

Our selection technique has the advantage of being easily applicable to LSST data, because the expected cadence for the DDF will be similar to the one used here (LSST Science Collaboration et al. 2009). However, it is important to note that LSST will provide ugrizy photometry, with single-epoch depths reaching to ∼24th magnitude, and ∼27th magnitude for the stacked images. Therefore, LSST will allow variability analyses of much fainter sources than those in the QUEST-La Silla survey to be performed, and with multiband light curves, which can help weed out FPs and allow for a more complete characterization of the variability properties of the candidates.

Moreover, LSST will have the advantage of including the u band, which has been repeatedly used in the past to select AGN candidates (in combination with other photometric bands), after a morphological cut is applied (e.g., Fan 1999; Richards et al. 2002, 2004; Smith et al. 2005; Richards et al. 2009; Bovy et al. 2011; Kirkpatrick et al. 2011; Ross et al. 2012). These color–color selection techniques will be easier to apply to the full stacked depths of the LSST data, where variability-based methods (like those presented here) will eventually become infeasible, due to large flux errors on the light curves. However, for the case of AGNs with strong host contamination, variability-based methods will have an advantage over color–color selection techniques.

6. Conclusions

We have presented a methodology to classify AGNs through variability analyses, particularly useful to find AGN populations missed by other optical selection techniques. We used data from the QUEST-La Silla AGN variability survey to construct a total of 208,583 well-sampled light curves in the COSMOS, XMM-LSS, Elais-S1, and ECDF-S fields. We characterize the variability of these sources by using different variability features (see Section 3.2). We used an RF algorithm to classify our objects as either AGN or non-AGN using variability features and optical colors. We tested three classification schemes, one that includes only variability features (RF1), one that includes variability features and the r − i and i − z colors (RF2), and one that includes variability features and the g − r, r − i, and i − z colors (RF3). We have a total of 5941 AGN candidates for the RF1 classifier, 5252 candidates for the RF2 classifier, and 4482 candidates for the RF3 classifier.

We confirmed the nature of our candidates by using ancillary data, and we found that a high fraction of the candidates from each classifier with g − r ≤ 0.6 are known AGNs from the literature (52% for RF1, 52.4% for RF2, and 52.1% for RF3; see Section 4.1), but the fraction of candidates with g − r > 0 confirmed by ancillary data is low (5.3% for RF1, 7.1% for RF2, and 12% for RF3). This is produced because most of the AGNs known from the literature are biased against bluer optical colors by their selection criteria. This motivated us to perform spectroscopic follow-up to confirm the nature of sources located in different regions of the color–color space.

We observed 54 candidates with EFOSC2/NTT and Goodman/SOAR, with 70% of the observed targets having g − r > 0. We confirm the nature of several interesting sources, including four BAL–QSOs, seven sources with zspec > 2.5, and eight LLAGNs. Our method was very efficient in classifying AGNs with g − r ≤ 0.6, for which we achieved 100% efficiency for all classifiers. For the case of sources with g − r > 0, our method also demonstrated good performance, achieving 73.7% efficiency for RF1, 79.4% for RF2, and 85.2% for RF3.

From the spectroscopic follow-up campaign, we conclude that the optimal classifier is the one that includes variability features and the r − i and i − z colors (RF2), as it avoids the region of the color–color space where we normally find cool stars, and also shows high efficiency, excluding only one observed type 1 AGN. The RF3 classifier also provides good results; however, it excluded four AGNs and one BAL–QSO. For the case of RF1, we propose that most of the candidates with g − r ∼ 1.5 and r − i ≳ 0.8 are LPV or binary stars.

Our work can be considered as a pilot study in preparation for LSST, as the selection techniques tested here can be easily implemented for LSST data. The cadence of the LSST's DDF will be similar to the one of QUEST-La Silla, but covering 10 yr of observations, which will considerably improve the selection efficiency. In addition, LSST will provide observations in more than one photometric band, which should prove useful for discarding artifacts and FPs. In particular, LSST will provide u band photometry, which has been exhaustively used in the literature for the selection of point-like AGN, with very high efficiencies (e.g., Richards et al. 2004, 2009). Thus, color–color selection methods will remain a critical approach in the LSST era, particularly for the selection of faint, point-like AGNs, because they can be applied to the full depths of the LSST data. However, optical color–color selections alone are not efficient at classifying morphologically extended AGNs. For these type of objects, a combination of optical colors and variability-based methods will be more suitable, as we have demonstrated in this work.

We thank Patrick Hall for his help classifying some of the candidates observed during the spectroscopic follow-up. We also thank the referee for a careful reading of the manuscript and comments that led to its improvement. P.S. was supported by CONICYT through "Beca Doctorado Nacional, Año 2013" grant #21130441. P.S. received partial support from the Center of Excellence in Astrophysics and Associated Technologies (PFB 06). P.L. acknowledges Fondecyt Grant #1161184. N.M. acknowledges the support of the Helmholtz Einstein International Berlin Research School in Data Science (HEIBRiDS), Deutsches Elektronensynchrotron (DESY), and Humboldt-Universität zu Berlin. L.C.H. was supported by the National Key R&D Program of China (2016YFA0400702) and the National Science Foundation of China (11721303). F.E.B. acknowledges support from CONICYT-Chile (Basal AFB-170002) and the Ministry of Economy, Development, and Tourism's Millennium Science Initiative through grant IC120009, awarded to The Millennium Institute of Astrophysics, MAS. This work was partially funded by the CONICYT PIA ACT172033.

This work was based on data products from observations made with ESO Telescopes at the La Silla Paranal Observatory under ESO program ID 0101.A-0417. This work is also based on observations obtained at the Southern Astrophysical Research (SOAR) telescope, which is a joint project of the Ministério da Ciência, Tecnologia, Inovações e Comunicações (MCTIC) do Brasil, the U.S. National Optical Astronomy Observatory (NOAO), the University of North Carolina at Chapel Hill (UNC), and Michigan State University (MSU).

Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England. The SDSS Web Site is http://www.sdss.org/.

The SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute Potsdam, University of Basel, University of Cambridge, Case Western Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Observatory, and the University of Washington.

Appendix A: Catalog of Observed Candidates

Here we present the list of candidates observed during our spectroscopic follow-up campaign. We provide the equatorial coordinates in degrees (J2000); telescope used; classifier from which the candidate was selected; measured redshift; quality flag of the measured redshift: (1) low-quality zspec, (2) good-quality zspec; the magnitude in the r band; the g − r color; the r − i color; and the spectroscopic classification. For details about the spectral analysis of these targets, see Appendix B.

Appendix B: Spectroscopic Analysis of the Observed Candidates

We obtained classification spectra for 21 of our candidates using both the red and blue cameras of the Goodman spectrograph (Clemens et al. 2004), mounted on the SOAR telescope. We used the 400 lines mm−1 grating and the 1farcs0 and 0farcs8 slits providing a typical resolution of ∼6 Å or better. We reduced the Goodman data following usual steps including bias subtraction, flat-fielding, cosmic-ray rejection (see van Dokkum 2001), wavelength calibration, flux calibration, and telluric correction using our own custom IRAF16 routines.

We also obtained classification spectra for 33 candidates using EFOSC2 (Buzzoni et al. 1984) mounted on the NTT at La Silla Observatory. We used the 236 lines mm−1 grating and the 1farcs0 slit providing a typical resolution of 18 Å. We followed the same observing procedures as the Public ESO Spectroscopic Survey for Transient Objects (PESSTO) collaboration (Smartt et al. 2015), using the PESSTO Observing Blocks (OBs) to perform our observations. We reduced our observations using the PESSTO pipeline (Smartt et al. 2015).

We corrected the reduced and calibrated one-dimensional spectra by Galactic extinction using the maps of Schlegel et al. (1998) and the model of Cardelli et al. (1989). We then computed their redshifts and spectral classes by cross-correlating every spectrum with a set of spectral classification templates from SDSS,17 a type 1 AGN composite spectrum (Croom et al. 2002), and a type 2 AGN spectrum (Jones et al. 2009). We define a redshift FLAG that indicates the quality of the measured redshift. We say that a computed redshift has good quality (FLAG = 2) when there are several lines in the spectrum, and these lines are not affected by absorption features; and we say that a computed redshift has low quality (FLAG = 1) when the spectrum has low signal to noise, when the number of emission lines available is low (one or two), or when the emission lines are highly affected by absorption features. We measured the full width at half-maximum (FWHM) of the emission lines (when they were present) of each spectrum, following a simple Gaussian fitting procedure with the PySpecKit Python package (Ginsburg & Mirocha 2011). The final classification of every source was done, complementing the results of the cross-correlation analysis with visual inspection of every spectrum. To distinguish between type 1 and type 2 AGNs, we requested that at least one emission line has FWHM > 1800 km s−1 in the rest frame. In Figure 10, we provide the rest-frame optical spectra of our 54 candidates.

Figure 10.

Figure 10. Rest-frame optical spectra of the observed candidates. The flux is in arbitrary units.

Standard image High-resolution image

Appendix C: Hp-AGN Candidates from the RF2 Classifier

In Table 9, we show the list of hp-AGN candidates selected using the RF2 classifier. We provide the equatorial coordinates in degrees (J2000), the r-band magnitude, the g − r and r − i optical colors, the most relevant variability features of RF2 (ASF, σrms, Q31, and Pvar), the predicted class, and the predicted class probability (PRF).

Table 9.  List of Hp-AGN Candidates Selected by the RF2 Classifier

R.A. Decl. r g − r r − i ASF σrms Q31 Pvar class PRF
148.314270 2.172078 20.30 0.72 0.32 0.002 4.547e-6 0.127 0.946 AGN 0.86
148.396057 0.476811 18.22 0.41 0.14 0.107 1.323e-6 0.105 0.999 AGN 0.91
148.432525 1.672532 17.62 0.37 0.15 0.069 −3.904e-6 0.088 0.967 AGN 0.82
148.497756 0.645931 19.79 0.22 0.31 0.239 1.555e-5 0.189 1.000 AGN 0.99

(This table is available in its entirety in FITS format.)

Download table as:  ASCIITypeset image

Footnotes

Please wait… references are loading.
10.3847/1538-4365/ab174f