The expectation–maximization algorithm applied to the search of point sources of astroparticles
Introduction
Several statistical techniques to detect point sources of cosmic particles together with the methods to estimate properly the associated probability of a signal have been the subject of a variety of studies [1], [2], [3], [4], [5], [6]. Even though the techniques used are based on known statistical methods, the search of point sources requires special considerations that have been treated to a large extent in these studies. In this paper, we develop a new method for the search of point sources based on the expectation–maximization algorithm (EM). The EM algorithm, which is an approach to maximum likelihood estimation, has been used in a wide variety of areas [7]. It is particularly well suited for our problem, since the concepts that underlie it, especially that of incomplete data or missing values, are linked in a natural way to the search of unknown point sources.
Methods to look for point sources can be roughly classified in two categories: binned and unbinned. In the first type, the sky is divided in a grid of right ascension and declination (usually rectangular bins or cones) and fluctuations due to possible signals are looked for inside these bins. Unbinned methods, on the other hand, rely on the construction of statistical functions that make use of the experimental data in a continuous way. Both types of methods have advantages and drawbacks. Binned methods are rather robust in the sense that they do not need in general to make appeal to information obtained by simulation and their results can be interpreted in a straightforward way. In most cases, the significance of a given fluctuation can be computed in an analytical way using the information of the data alone. They do not use, however, the information in a totally efficient way even when the bin size is optimized, since only the information about whether a given event is inside or outside a bin is used. Moreover, a source may be located at the border of a bin. To increase its efficiency, this obliges to make several trials with shifted grids, which complicates the estimate of the probability of the observed fluctuations. Finally, in binned methods it is not easy to include other information besides that of the direction of the tracks. On the other hand, in unbinned methods the information is used in a continuous way, so the above drawbacks are absent and the experimental data is used in an efficient way. They need, however, special care to interpret the significance of a given result and tend to require input from simulation at one or another stage of the process. On the positive side, they can incorporate additional information besides the event direction and the samples can be analysed a posteriori on an event by event basis assigning weights computed from the value of the relevant variables.
The EM algorithm as used in this article belongs to the category of unbinned methods. As a cluster algorithm and unlike other unbinned methods it does not require in principle information about the detector performance, although to estimate significances one may prefer to use simulation results at the final stage to avoid problems associated with low statistics or low probability tails. Knowledge about the angular resolution of the detector is not required, since the algorithm adapts the cluster size during the EM process itself. The EM algorithm is a well-known method used in a wide variety of problems in clustering analysis and thus improvements or refinements of the method used in other areas can be applied to the present case. In our particular instance, the expectation–maximization procedure is performed analytically. The method is rooted in likelihood analysis and therefore its results can be interpreted using standard statistics. An important difference is that it uses the concept of incomplete data or missing values – in our case the information on whether the event comes from a source or from the background – which can be used after the maximization process to weigh the events.
The paper is organized as follows. In Section 2, we give a brief description of the basics of the EM method, referring the reader to Appendix A The two steps of the EM algorithm, Appendix B The EM algorithm for Gaussian mixture models for a short account of the iterative algorithm itself and the particularly simple form it takes in the case of normal multivariate mixtures. In Section 3, we show how the method can be applied to the search of point sources of cosmic particles. Section 4 is devoted to the question of how to interpret the results of the method in terms of discovery limits. In Section 5, we treat the particular case of neutrino telescopes and the corresponding results are given in Section 6. Finally, a summary of the work is given in Section 7.
Section snippets
The expectation–maximization algorithm
The expectation–maximization algorithm (EM) is a general approach to maximum likelihood estimation for finite mixture model problems. In these models different groups in the data are described by different density components, so that the total probability density function (pdf) can be expressed as , where g is the number of mixture components, are the mixing proportions that satisfy the unitary relation and , , are the component density functions
The EM algorithm in the search of point sources
The EM algorithm as described above can be applied in a natural way to the search of point sources of astroparticles. In this case, the data sample consists of events characterized by the direction of the incoming cosmic particle, given by the values of its right ascension and declination, and . Although we will restrict ourselves in our example to the track angles, other variables relevant to the search can also be included, notably the energy of the particle. The point source search is
Significances and result interpretation
Once a cluster is found a criterion is needed to confirm or reject the existence of a point source. In our case, we use an observable discriminator, the so-called Bayesian Information Criterion (BIC) as defined in [10]:where indicates one of the possible k models that we want to compare, is the estimate obtained by the algorithm for the set of parameters that define the k model, is the number of parameters estimated by the algorithm in the model, D is
Application to neutrino telescopes
Following the concepts explained above, we have applied the EM algorithm to the search of point sources in the particular case of neutrino astronomy. The results can be easily extended to the search of other types of astroparticle point sources.
Results from the presently running neutrino telescopes [3] and some theoretical predictions indicate that neutrino telescopes should have effective areas of about 1 km2 or higher to make high energy neutrino astronomy. A new generation of kilometre-scale
Results for a 1 km2 neutrino telescope
The results obtained applying the concepts of Sections 2 The expectation–maximization algorithm, 3 The EM algorithm in the search of point sources, 4 Significances and result interpretation to the Monte Carlo data described in Section 5 are given in the present section. We compare the results of the EM algorithm to those of a binning technique using the same simulated data. The binning method that we use is explained in detail somewhere else [12]. In Section 6.1, we show the discovery power of
Summary and conclusions
In this contribution, we have presented a powerful method based on the EM algorithm to look for faint point sources of cosmic particles. This method has been applied to a generic neutrino telescope of 1 km3 size located in the Mediterranean sea. The method, nevertheless, is of a more general nature and can be applied to the search of other type of sources. This technique has been advantageously compared with the results of a classical method with binning for the same sample of simulated events.
Acknowledgement
This work has been supported by the Spanish Ministerio de Educación y Ciencia (MEC) under Grant FPA2006-04277.
References (12)
Astropart. Phys.
(2006)- et al.
Astropart. Phys.
(2003) Astropart. Phys.
(2006)Nucl. Instrum. Method Phys. Res. A
(2006)Nature
(1973)et al.Astrophys. J.
(1983)Nucl. Instrum. Method Phys. Res. A
(1993)Astrophys. J.
(2001)Phys. Rev. Lett.
(2004)Phys. Rev. D
(2007)