Elsevier

Astroparticle Physics

Volume 29, Issue 2, March 2008, Pages 117-124
Astroparticle Physics

The expectation–maximization algorithm applied to the search of point sources of astroparticles

https://doi.org/10.1016/j.astropartphys.2007.12.002Get rights and content

Abstract

The expectation–maximization algorithm, widely employed in cluster and pattern recognition analysis, is proposed in this article for the search of point sources of astroparticles. We show how to adapt the method for the particular case in which a faint source signal over a large background is expected. In particular, the method is applied to the point source search in neutrino telescopes. A generic neutrino telescope of an area of 1 km2 located in the Mediterranean Sea has been simulated. Results in terms of minimum detectable number of events are given and the method is compared advantageously with the results of a classical method with binning.

Introduction

Several statistical techniques to detect point sources of cosmic particles together with the methods to estimate properly the associated probability of a signal have been the subject of a variety of studies [1], [2], [3], [4], [5], [6]. Even though the techniques used are based on known statistical methods, the search of point sources requires special considerations that have been treated to a large extent in these studies. In this paper, we develop a new method for the search of point sources based on the expectation–maximization algorithm (EM). The EM algorithm, which is an approach to maximum likelihood estimation, has been used in a wide variety of areas [7]. It is particularly well suited for our problem, since the concepts that underlie it, especially that of incomplete data or missing values, are linked in a natural way to the search of unknown point sources.

Methods to look for point sources can be roughly classified in two categories: binned and unbinned. In the first type, the sky is divided in a grid of right ascension and declination (usually rectangular bins or cones) and fluctuations due to possible signals are looked for inside these bins. Unbinned methods, on the other hand, rely on the construction of statistical functions that make use of the experimental data in a continuous way. Both types of methods have advantages and drawbacks. Binned methods are rather robust in the sense that they do not need in general to make appeal to information obtained by simulation and their results can be interpreted in a straightforward way. In most cases, the significance of a given fluctuation can be computed in an analytical way using the information of the data alone. They do not use, however, the information in a totally efficient way even when the bin size is optimized, since only the information about whether a given event is inside or outside a bin is used. Moreover, a source may be located at the border of a bin. To increase its efficiency, this obliges to make several trials with shifted grids, which complicates the estimate of the probability of the observed fluctuations. Finally, in binned methods it is not easy to include other information besides that of the direction of the tracks. On the other hand, in unbinned methods the information is used in a continuous way, so the above drawbacks are absent and the experimental data is used in an efficient way. They need, however, special care to interpret the significance of a given result and tend to require input from simulation at one or another stage of the process. On the positive side, they can incorporate additional information besides the event direction and the samples can be analysed a posteriori on an event by event basis assigning weights computed from the value of the relevant variables.

The EM algorithm as used in this article belongs to the category of unbinned methods. As a cluster algorithm and unlike other unbinned methods it does not require in principle information about the detector performance, although to estimate significances one may prefer to use simulation results at the final stage to avoid problems associated with low statistics or low probability tails. Knowledge about the angular resolution of the detector is not required, since the algorithm adapts the cluster size during the EM process itself. The EM algorithm is a well-known method used in a wide variety of problems in clustering analysis and thus improvements or refinements of the method used in other areas can be applied to the present case. In our particular instance, the expectation–maximization procedure is performed analytically. The method is rooted in likelihood analysis and therefore its results can be interpreted using standard statistics. An important difference is that it uses the concept of incomplete data or missing values – in our case the information on whether the event comes from a source or from the background – which can be used after the maximization process to weigh the events.

The paper is organized as follows. In Section 2, we give a brief description of the basics of the EM method, referring the reader to Appendix A The two steps of the EM algorithm, Appendix B The EM algorithm for Gaussian mixture models for a short account of the iterative algorithm itself and the particularly simple form it takes in the case of normal multivariate mixtures. In Section 3, we show how the method can be applied to the search of point sources of cosmic particles. Section 4 is devoted to the question of how to interpret the results of the method in terms of discovery limits. In Section 5, we treat the particular case of neutrino telescopes and the corresponding results are given in Section 6. Finally, a summary of the work is given in Section 7.

Section snippets

The expectation–maximization algorithm

The expectation–maximization algorithm (EM) is a general approach to maximum likelihood estimation for finite mixture model problems. In these models different groups in the data are described by different density components, so that the total probability density function (pdf) can be expressed as p(x)=j=1gπjp(x;θj), where g is the number of mixture components, πj0 are the mixing proportions that satisfy the unitary relation j=1gπj=1 and p(x;θj), j=1,,g, are the component density functions

The EM algorithm in the search of point sources

The EM algorithm as described above can be applied in a natural way to the search of point sources of astroparticles. In this case, the data sample consists of events characterized by the direction of the incoming cosmic particle, given by the values of its right ascension and declination, α and δ. Although we will restrict ourselves in our example to the track angles, other variables relevant to the search can also be included, notably the energy of the particle. The point source search is

Significances and result interpretation

Once a cluster is found a criterion is needed to confirm or reject the existence of a point source. In our case, we use an observable discriminator, the so-called Bayesian Information Criterion (BIC) as defined in [10]:BICk=2logp(D|θˆk,Mk)-νklog(n)where Mk indicates one of the possible k models that we want to compare, θˆk is the estimate obtained by the algorithm for the set of parameters that define the k model, νk is the number of parameters estimated by the algorithm in the Mk model, D is

Application to neutrino telescopes

Following the concepts explained above, we have applied the EM algorithm to the search of point sources in the particular case of neutrino astronomy. The results can be easily extended to the search of other types of astroparticle point sources.

Results from the presently running neutrino telescopes [3] and some theoretical predictions indicate that neutrino telescopes should have effective areas of about 1 km2 or higher to make high energy neutrino astronomy. A new generation of kilometre-scale

Results for a 1 km2 neutrino telescope

The results obtained applying the concepts of Sections 2 The expectation–maximization algorithm, 3 The EM algorithm in the search of point sources, 4 Significances and result interpretation to the Monte Carlo data described in Section 5 are given in the present section. We compare the results of the EM algorithm to those of a binning technique using the same simulated data. The binning method that we use is explained in detail somewhere else [12]. In Section 6.1, we show the discovery power of

Summary and conclusions

In this contribution, we have presented a powerful method based on the EM algorithm to look for faint point sources of cosmic particles. This method has been applied to a generic neutrino telescope of 1 km3 size located in the Mediterranean sea. The method, nevertheless, is of a more general nature and can be applied to the search of other type of sources. This technique has been advantageously compared with the results of a classical method with binning for the same sample of simulated events.

Acknowledgement

This work has been supported by the Spanish Ministerio de Educación y Ciencia (MEC) under Grant FPA2006-04277.

References (12)

  • A. Chilingarian

    Astropart. Phys.

    (2006)
  • G.C. Hill et al.

    Astropart. Phys.

    (2003)
  • A. Achterberg

    Astropart. Phys.

    (2006)
  • S. Kuch

    Nucl. Instrum. Method Phys. Res. A

    (2006)
  • E. O’Mongain

    Nature

    (1973)
    T. Li et al.

    Astrophys. J.

    (1983)
    D.E. Alexandreas

    Nucl. Instrum. Method Phys. Res. A

    (1993)
  • M. Ambrosio

    Astrophys. J.

    (2001)
    J. Ahrens

    Phys. Rev. Lett.

    (2004)
    A. Achterberg

    Phys. Rev. D

    (2007)
There are more references available in the full text version of this article.

Cited by (0)

View full text