A publishing partnership

FROM FINANCE TO COSMOLOGY: THE COPULA OF LARGE-SCALE STRUCTURE

, , , and

Published 2009 December 10 © 2010. The American Astronomical Society. All rights reserved.
, , Citation Robert J. Scherrer et al 2010 ApJL 708 L9 DOI 10.1088/2041-8205/708/1/L9

2041-8205/708/1/L9

ABSTRACT

Any multivariate distribution can be uniquely decomposed into marginal (one-point) distributions, and a function called the copula, which contains all of the information on correlations between the distributions. The copula provides an important new methodology for analyzing the density field in large-scale structure. We derive the empirical two-point copula for the evolved dark matter density field. We find that this empirical copula is well approximated by a Gaussian copula. We consider the possibility that the full n-point copula is also Gaussian and describe some of the consequences of this hypothesis. Future directions for investigation are discussed.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

The standard model for the formation of large-scale structure assumes that the universe at high redshift contained a dark matter density field characterized by a multivariate Gaussian distribution. This density field evolved, under the action of gravity, into a highly non-Gaussian dark matter density field, with the present-day observed distribution of galaxies tracing (in a biased fashion) the underlying dark matter.

Many tools have been developed to characterize the final evolved distribution of matter. The most widely used are the n-point correlation functions (Peebles 1980). When applied to a discrete density field (such as the observed galaxy distribution), they give the probability (in excess of random) of observing galaxies at a set of n points in a fixed geometrical configuration relative to each other. For a continuous density field (such as the theoretical dark matter distribution), these n-point functions can be expressed in terms of the density field measured at n points at a fixed relative separation. The knowledge of all of the correlation functions up to arbitrarily large n completely characterizes a given density field or galaxy distribution.

The problem is that in practice, it is impossible to measure correlation functions to arbitrarily high order. The two-point correlation function is known to very high accuracy, and the three-point function of the distribution of galaxies is also well measured. However, precise measurements of the four-point correlation function or any higher orders are difficult or impossible for current data. Although the two- and three-point correlation functions provide a great deal of information about the galaxy distribution, we are left with an incomplete characterization of this distribution.

Attempts have been made, therefore, to slice the information contained in the density field (or in the distribution of galaxies) in different ways. For example, the void probability function (White 1979; Fry et al. 1988) mixes together information from correlation functions of all orders, as do percolation statistics (Zel'dovich 1982; Shandarin 1983; Sahni et al. 1997). Similarly, the one-point probability distribution function (PDF) has been widely explored (Coles & Jones 1991; Kofman et al. 1994; Protogeros & Scherrer 1997; Scherrer & Gaztanaga 2001; Lam & Sheth 2008); it also samples the information in the density field in a different way from the correlation functions. However, none of these statistics provides a complete description of the density field; they all sample only part of the information.

In the case of the one-point PDF, however, it is possible to introduce a new statistical tool, the copula, which provides the rest of the information contained in the density field. The copula and the one-point PDF together completely characterize the density distribution, and this decomposition is unique for any multivariate density field. Roughly speaking, the copula indicates how the one-point PDFs are joined together to give the n-point PDF.

The copula was first defined and characterized by Sklar (1959), and it has been most widely applied in the field of mathematical finance. In fact, misuse of the copula has been blamed for the recent meltdown in the mortgage-backed securities industry. The copula has been used in various areas of engineering, especially hydrology (Genest & Favre 2007), but it has not been widely applied in astronomy or astrophysics (although see the recent papers by Jiang et al. 2009 and Benabed et al. 2009). To our knowledge, this Letter represents the first application to the analysis of large-scale structure.

In the next section, we review the definition and properties of the copula. In Section 3, we apply the copula methodology to a simulated dark matter density field in the standard ΛCDM cosmology. We find that the two-point copula of the evolved density field is well approximated by a Gaussian copula. This has several interesting consequences, which are elucidated in Section 4. Since our main purpose in this Letter is to introduce this technique into the field of large-scale structure, we defer more detailed investigations to a later paper.

2. WHAT IS A COPULA?

The discussion in this section is taken primarily from Nelson (1999), Malevergne & Sornette (2003), and Genest & Favre (2007). Note that the terminology in the statistics literature tends to differ slightly from that used in cosmology; we will use the latter terminology here.

Consider the PDF of the distribution of densities at n points, r1, r2, ..., rn. We will denote this n-point PDF as pn1, δ2, ..., δn). As noted in the previous section, a great deal of work has been devoted to the investigation of the one-point distribution, p(δ). The copula is a function that provides all of the remaining information necessary to construct the n-point PDF, once this one-point PDF is known. Hence, it couples together the individual one-point PDFs to produce the full n-point PDF; this is the origin of the term "copula." Since the statistics of density fields in large-scale structure are translation-invariant, all of our one-point PDFs will be identically the same, but this need not be the case for the general definition of the copula.

The copula is defined in terms of the n-point cumulative distribution function (CDF) rather than PDF. Recall that the n-point CDF, Pn1, δ2, ..., δn), is defined as

Equation (1)

and the definition of the one-point CDF is just

Equation (2)

(We follow the standard convention of lower-case symbols for PDFs and upper-case symbols for CDFs.) Then the copula function C(u1, u2, ..., un) is the unique function that satisfies the relation

Equation (3)

Since we are describing a cosmological density field, we can take all of the one-point CDFs on the right-hand side to be the same, but this is not the most general definition of the copula. Sklar's (1959) theorem states that a function satisfying Equation (3) always exists and that it is unique. Hence, the n-point copula and the one-point PDF completely characterize the n-point PDF of the density field.

It might appear that we have gained nothing from this exercise, since we have simply replaced an infinite hierarchy of correlation functions with an infinite hierarchy of copula functions. However, this is not the case. The n-point copula function contains significantly more information than the corresponding n-point correlation function. In the next section, for example, we characterize the two-point copula for a simulated evolved density field. The information in the two-point copula, along with the one-point PDF, completely characterizes the two-point density distribution function, p1, δ2), which cannot be determined solely from the knowledge of the two-point correlation function and the one-point PDF. A number of interesting conclusions can be drawn from the two-point copula alone.

Since CDFs vary between 0 and 1, the copula function maps an n-dimensional unit cube onto the unit interval. From the general properties of CDFs, it follows that C(u1, u2, ..., un) = 0 when any single ui is 0, and C(1, 1, ..., ui, ..., 1) = ui.

The copula has an additional important property that we will exploit several times. Consider a density field δ1, δ2, ..., δn, and a second density field obtained by a local monotonic transformation on the first one: f11), f22), ..., fnn). Then these two density fields have the same copula. Note that the functions f1, f2, ..., fn do not have to be the same; all that is required is that each function be a monotonic increasing function. For instance, suppose we begin with a Gaussian density field and exponentiate each δ to produce a log-normal density field (Coles & Jones 1991). Then the initial Gaussian density field and the corresponding log-normal density field have the same copula; the difference between them is determined entirely by the one-point PDF.

For simplicity, we will now confine our attention to two-point copulas, C(u, v), with 0 ⩽ u ⩽ 1, 0 ⩽ v ⩽ 1, and 0 ⩽ C(u, v) ⩽ 1. There are several special cases of interest. First consider the case of two uncorrelated densities, δ1 and δ2. In this case, p1, δ2) = p1)p2), so the copula is just

Equation (4)

Since we will be dealing with Gaussian initial conditions, a second important copula will be the Gaussian copula (see, e.g., Malevergne & Sornette 2003) given by

Equation (5)

Here Φr is the two-point Gaussian CDF with unit variance and correlation r:

Equation (6)

while Φ−1 is the inverse of the one-point Gaussian CDF with unit variance.

A Gaussian density field (such as that assumed for the initial conditions for large-scale structure) has both a Gaussian copula and a Gaussian one-point distribution. However, it is possible for a non-Gaussian density field to have a Gaussian copula (e.g., any local monotonic transformation on a Gaussian field, such as the lognormal model discussed above), and it is also possible for a field to have a Gaussian one-point distribution and a non-Gaussian copula. In the latter case, the copula formalism provides a convenient way to generate a variety of non-Gaussian fields with Gaussian one-point PDFs (Nelson 1999).

3. THE COPULA OF THE NONLINEAR DENSITY FIELD

Armed with the results of the previous section, we now examine an evolved nonlinear density field. Using the standard ΛCDM model, we analyze the mass distribution from a high resolution N-body simulation from the LasDamas project (C. K. McBride et al. 2009, in preparation). The simulation was run with 14003 particles in a box of side length 420 h−1 Mpc, and a flat cosmology specified by Ωm = 0.25, ΩΛ = 0.75, Ho = 70 km s−1 Mpc−1, σ8 = 0.8, ns = 1.0. We sample the density field at redshift zero using a spherical top hat of radius 1 h−1 Mpc, corresponding to a highly nonlinear density field. Given the resolution of the simulation, the mean number of particles per sphere is 160. The evolved one-point PDF is shown in Figure 1; it is highly non-Gaussian. To determine the two-point copula, we sample pairs of points separated by 2 h−1 Mpc and 6 h−1 Mpc, respectively. Our goal is to measure the copula for both ξ < 1 and ξ>1, and we find that the two-point correlation of dark matter particles at these separations is ξ(2 h−1 Mpc) = 6.63 and ξ(6 h−1 Mpc) = 0.873. At much larger separations, where ξ ≪ 1, the densities at the two points are essentially uncorrelated, and the copula simply takes the form in Equation (4).

Figure 1.

Figure 1. One-point PDF of our density field, sampled with a spherical top-hat window function of radius 1 h−1 Mpc.

Standard image High-resolution image

We sample 163,216 pairs of densities at each of the two separations. We then use these density pairs to derive the "empirical copula," using the procedure outlined in Genest & Favre (2007). We exploit the fact that the copula is unchanged if we make a local monotonic transformation on the density field. The particular monotonic transformation we make on each of our two columns of densities is to replace each density by its rank within its own column, Ri). Thus, a given density pair, δ1, δ2, is mapped to R11), R22), where the ranking is determined separately for each column of densities. Then we divide by the number of pairs of points, n = 163, 216, to give R11)/n, R22)/n. It is easy to see that the distribution of ranks divided by the number of ranked points has a uniform CDF. Hence, for our new distribution, the right-hand side of Equation (3) has P(R11)/n) = R11/n), P(R22)/n) = R22/n), and the equation becomes

Equation (7)

In other words, the two-point distribution obtained by replacing each density with its rank (divided by the number of points) is the two-point copula. The copula obtained in this way is called the empirical copula.

We have used our sampled pairs of points to derive the empirical copula for both separations. Since the two-point copula is a mapping from [0, 1] × [0, 1] into [0, 1], we have chosen to display the copulas as contour plots in Figures 2 and 3. This empirical two-point copula, displayed as a solid contour, is the main result of this Letter; along with the one-point PDF for the density, it provides a complete description of the two-point density distribution at the given separation.

Figure 2.

Figure 2. Empirical two-point copula C(u, v) for a simulated dark matter density distribution at a separation of 2 h−1 Mpc. Solid curves are the contours corresponding to (from lower left to upper right) C(u, v) = 0.1, 0.3, 0.5, 0.7, 0.9. Dashed curves give the Gaussian copula with the value of r corresponding to Spearman's ρ calculated for the data.

Standard image High-resolution image
Figure 3.

Figure 3. As Figure 2, for a separation of 6 h−1 Mpc.

Standard image High-resolution image

However, we can go further and ask if the empirical copula corresponds to any simple functional behavior. Since the initial copula is Gaussian, the obvious choice is the Gaussian copula given by Equation (5). This raises an obvious question: what value of r do we assume for our theoretical Gaussian copula? This value of r will not, in general, correspond to the normalized two-point correlation function of the density field, ξ/σ2, since the latter also depends on the specific one-point PDF. Instead, we follow Genest & Favre (2007) to compute Spearman's ρ for the data and convert this into the value of r for a corresponding Gaussian.

Spearman's ρ is essentially the correlation function for the data ranks. Let R1i and R2i be the ranks of the ith data point in each of our two columns of data. Then Spearman's ρ for our n pairs of data points is defined as

Equation (8)

Here $\bar{R}$ is the mean value of the rank, which is, of course, $\bar{R} = (n+1)/2$. The value of ρ is related to an integral over the copula (Nelson 1999; Genest & Favre 2007):

Equation (9)

For a Gaussian copula, the relation between Spearman's ρ and the value of r that appears in Equation (6) is (Kruskal 1958; Genest & Favre 2007)

Equation (10)

The values of ρ for our data are ρ(2 h−1 Mpc) = 0.474 and ρ(6 h−1 Mpc) = 0.139, which correspond to r(2 h−1 Mpc) = 0.491 and r(6 h−1 Mpc) = 0.146. Using these values for r, and Equations (5) and (6), we have constructed the Gaussian copulas that should provide the best fit to the empirical copulas, if the latter are indeed Gaussian. These are displayed in Figures 2 and 3. The Gaussian copulas appear to match the empirical copulas in both cases.

4. DISCUSSION

Our results indicate that the two-point copula for the present-day dark matter density field is well approximated by a Gaussian copula. This result, along with the knowledge of the one-point PDF, is sufficient to completely characterize p1, δ2). The most obvious open question is then whether all of the higher-order copulas are also Gaussian; we will defer the investigation of this Gaussian copula hypothesis (GCH) to a future paper. If the GCH were true, it would imply that the nonlinear density field could be derived by a local transformation of an underlying Gaussian field, an idea which has been explored in the past (see, e.g., Coles & Jones 1991). Note, however, that this does not imply that the evolved density field is a local transformation of the original Gaussian dark matter density field; the Gaussian field that is locally mapped to produce the final density field could be some other Gaussian density field. But the GCH would imply that all of the non-Gaussian information in the nonlinear density field could be derived in terms of the one-point PDF. For example, all of the higher-order correlation functions would depend only on this PDF.

These arguments are related to the Gaussianization process of Weinberg (1992). Weinberg explored the possibility that gravitational evolution preserves the rank order of the density field, so that mapping the nonlinear density field monotonically onto a Gaussian field would reproduce the initial density field. It is clear that this process changes only the one-point PDF and leaves the copula unchanged. The results on reconstruction were somewhat mixed; while there is a reasonable correlation between the initial density field and the reconstructed density field, the correspondence is certainly not exact (Narayanan & Croft 1999). However, this result does not contradict the GCH; as noted above, there is no reason to assume that the Gaussian field that is locally transformed into the final density field is identical to the initial Gaussian density field. In fact, the two Gaussian fields could even have different values for r (see also the discussions of Pando et al. 2001 and Neyrinck et al. 2009 on these issues).

A more direct constraint on the GCH comes from measures of topology (Doroshkevich 1970; Hamilton et al. 1986; Gott et al. 1987; Weinberg et al. 1987; Melott et al. 1988), or more generally, Minkowski functionals (Mecke et al. 1994; Kerscher et al. 1997). When the independent variable in these calculations is taken to be the volume filling factor, rather than the density threshold, then such statistics effectively divide out the effect of the one-point PDF; therefore, they can depend only on the behavior of the copula (see, e.g., Shandarin 2002 for a detailed discussion of this point). For the case of topology, the GCH then implies that the genus curve of the nonlinear evolved density field will have the shape characteristic of a Gaussian density field (unlike the case of Gaussianization, this result does not depend on the Gaussian copula matching the initial Gaussian density field). This was claimed to be the case in the first simulations of topology (Weinberg et al. 1987; Melott et al. 1988). More recent simulations (Park et al. 2005; Kim et al. 2009) indicate that the genus curve retains its Gaussian shape for moderate smoothing lengths, but it clearly departs from Gaussianity (in terms of the "shift parameter," which is the relevant quantity here) on the highly nonlinear length scale we have examined (1 h−1 Mpc). These results argue against the GCH on nonlinear scales. Clearly, the higher-order copula functions are worthy of further study.

Of course, we actually observe the distribution of galaxies, and not dark matter. The discussion in the previous sections shows that for biasing schemes that are local and monotonic (such as those explored by Coles 1993; Fry & Gaztanaga 1993; Scherrer & Weinberg 1998; Coles et al. 1999; Narayanan et al. 2000) the copula of the galaxy distribution will be identical to the copula of the underlying dark matter density field. This will not necessarily be the case for non-local bias, or stochastic bias (Dekel & Lahav 1999). The best current models include some degree of stochastic bias; what remains to be seen is the size of the effect on the copula.

This short introductory Letter leaves open a number of questions, several of which we are currently investigating. The most important is whether the higher-order copulas of the density field are also Gaussian. While it is obviously impossible to examine this question to all orders, an investigation of the three-point copula is straightforward and should provide a useful check. Other directions for future investigation are the effects of non-local or stochastic bias, redshift distortions, and the copula of the observed galaxy distribution.

We believe that the copula has the potential to serve as an important new tool in the analysis of large-scale structure. It appears to be less sensitive to bias (e.g., completely unaffected by local monotonic bias) than other statistics. If the GCH applies, then the full density field can be completely characterized by a single function (the one-point PDF) and a series of numerical parameters (the correlations r for the copula as a function of length scale). For example, in this case the hierarchical clustering coefficients can be derived as functions of the one-point PDF. Even if the GCH does not apply, the copula allows us to measure the underlying "coupling" between the density field at different points in an entirely new way, moving beyond the limited information in the low-order correlation functions. The copula can also be used to analyze the evolution of the density field, via a computation of the two-point copula for the density measured at the same points in the initial and final density fields.

We note in passing that it is precisely the Gaussian copula which has been blamed for the recent mortage-backed securities meltdown. We presume that any error in this Letter will have less dire consequences.

R.J.S. was supported in part by the Department of Energy (DE-FG05-85ER40226). We thank D.H. Weinberg for helpful discussions.

Please wait… references are loading.
10.1088/2041-8205/708/1/L9