Articles

THE zCOSMOS* 20k GROUP CATALOG

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and

Published 2012 June 21 © 2012. The American Astronomical Society. All rights reserved.
, , Citation C. Knobel et al 2012 ApJ 753 121 DOI 10.1088/0004-637X/753/2/121

0004-637X/753/2/121

ABSTRACT

We present an optical group catalog between 0.1  ≲  z  ≲  1 based on 16,500 high-quality spectroscopic redshifts in the completed zCOSMOS-bright survey. The catalog published herein contains 1498 groups in total and 192 groups with more than five observed members. The catalog includes both group properties and the identification of the member galaxies. Based on mock catalogs, the completeness and purity of groups with three and more members should be both about 83% with respect to all groups that should have been detectable within the survey, and more than 75% of the groups should exhibit a one-to-one correspondence to the "real" groups. Particularly at high redshift, there are apparently more galaxies in groups in the COSMOS field than expected from mock catalogs. We detect clear evidence for the growth of cosmic structure over the last seven billion years in the sense that the fraction of galaxies that are found in groups (in volume-limited samples) increases significantly with cosmic time. In the second part of the paper, we develop a method for associating galaxies that only have photo-z to our spectroscopically identified groups. We show that this leads to improved definition of group centers, improved identification of the most massive galaxies in the groups, and improved identification of central and satellite galaxies, where we define the former to be galaxies at the minimum of the gravitational potential wells. Subsamples of centrals and satellites in the groups can be defined with purities up to 80%, while a straight binary classification of all group and non-group galaxies into centrals and satellites achieves purities of 85% and 75%, respectively, for the spectroscopic sample.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

Galaxy groups are gravitationally bound systems that contain multiple galaxies inhabiting the same dark matter (DM) halo. They are of interest for two main reasons. First, by regarding them as DM halos, they can serve as cosmological probes. The number density and clustering of groups for a given halo mass and cosmic epoch depend on the underlying cosmology. Second, galaxy groups constitute an environment for galaxies which is special compared with the general field. The enhanced proximity of other galaxies and the presence of an intragroup medium may produce distinct evolutionary processes in groups such as enhanced merging rates (Spitzer & Baade 1951), galaxy harassment (Moore et al. 1996), ram pressure stripping (Gunn & Gott 1972), or strangulation (Balogh & Morris 2000) which may be significant for the general evolution of galaxies, and particularly the environmental differentiation of the galaxy population (e.g., Weinmann et al. 2006; Gerke et al. 2007; Iovino et al. 2010; Kovač et al. 2010; Peng et al. 2010). A difference between central and satellite galaxies in groups is now an established part of our view of galaxy evolution (e.g., van den Bosch et al. 2008; Skibba 2009; Pasquali et al. 2010; Skibba et al. 2011; Peng et al. 2011). A key requirement for both areas is the availability of large, high-quality group catalogs.

There are several desired properties for a group catalog. Purity and completeness are two often conflicting requirements—completeness is the fraction of real groups that are recovered, while purity reflects the reality of the claimed groups. Once the groups are identified, one can further define purity and completeness for the membership of individual galaxies in these groups. The optimization between completeness and purity will often depend on the application: high-purity catalogs covering a large redshift range enable studies of galaxy evolution in different environments over cosmic time. On the other hand, having complete catalogs that trace the numbers of real groups and provide reliable mass estimates for individual groups is important for cosmological studies. The estimation of reliable masses for individual groups in turn requires a high degree of one-to-one correspondences between reconstructed and real groups. Precise estimates for the group centers is needed for stacking analyses of X-ray properties or detection of the weak-lensing signal, while studying the differences between "central" and "satellite" galaxies requires complete group populations down to a given flux limit since otherwise the central galaxy cannot be reliably identified.

In this paper we present a new group catalog produced with the zCOSMOS-bright survey (Lilly et al. 2007), which now contains about 16,500 high-quality spectroscopic galaxies with IAB  ⩽  22.5 in the redshift range 0.1  ≲  z  ≲  1.2 (the "20k sample"). zCOSMOS-bright covers the ∼1.7 deg2 of the COSMOS field (Scoville et al. 2007b) which was fully observed by the Hubble Space Telescope (Scoville et al. 2007a; Koekemoer et al. 2007) down to IAB < 28 (5σ) and followed up in more than 30 bands by several telescopes from radio to X-ray wavelengths (Capak et al. 2007). This unique combination of observational data on a single field makes the COSMOS field very suitable for studying the properties of groups as a function of redshift and the evolution of galaxies in different environment. The large numbers of wavelength bands also allows the production of high-quality photometric redshifts ("photo-z") with an accuracy of δz ∼ 0.01(1 + z) (e.g., Ilbert et al. 2009) for the brighter galaxies, allowing the possibility of using these to supplement the spectroscopic redshifts and assign, at least probabilistically, group membership to these galaxies.

The first major data release of zCOSMOS entailed about 8500 spectroscopic galaxy redshifts (the "10k sample"; Lilly et al. 2009) and was used to produce a first optical group catalog in the redshift range 0.1  ≲  z  ≲  1 (Knobel et al. 2009, "K09"). In that paper we discussed in detail the group-finding methods and basic properties of the "10k group catalog." We adopted two group-finding algorithms, friends-of-friends (FOF) and a Voronoi–Delaunay method (VDM), and compared their performances on simulated mock galaxy samples. We introduced a "multi-run scheme" in which we successively used different group-finding parameters, optimized for different richness groups, where by richness we always refer to the number N of observed spectroscopic members. By initially tuning the parameters to detect only the richest groups, and then ignoring the subsequent fragmentation of these into smaller groups when the parameters were tuned to smaller scales, we could improve the statistics of the catalog in terms of completeness and purity over a wide range of scales, minimizing the effects of fragmentation and overmerging (see K09 for a discussion). The FOF catalog was used as the basic 10k group catalog while the VDM catalog was used to produce subcatalogs with further enhanced purities. The basic 10k catalog contained 802 groups in total and 102 groups with more than five members.

The group catalog presented in this paper is created in a similar way to that in K09 from the larger sample that is now available. However, since it now contains groups extending up to 30 members, we had to slightly extend the methods to guarantee its high quality over this wider range of richness.

In contrast to the zCOSMOS 10k sample whose completeness was only about 30% and for which it would have made little sense to use information from photo-z, the completeness of the 20k sample now exceeds 50% and the photo-z objects become a minority. Thus, it becomes attractive to try to associate these remaining photo-z objects to the spectroscopically identified groups, so that an idea of group membership can be obtained for all galaxies down to the magnitude limit of the survey. This is useful for many scientific goals. We therefore develop a method for incorporating the photo-z galaxies into the spectroscopic group population by assigning to each photo-z galaxy a probability that it is a member of a given group. This probability is based on the projected spatial distance of the galaxy from the group center and its photo-z relative to the redshift of the group, calibrated against mock catalogs. Including the photo-z galaxies enables improved estimates of the location of the group center, and improved identification of the most massive galaxy in the group, and of the galaxy lying at the center of the potential well, which we define as the central galaxy. For the latter two cases, we can construct various samples which represent trades between completeness and purity. As a result, we also look into how well we can apply a binary central-satellite classification to all galaxies in the sample, including those not associated with groups.

With the final 20k sample we produce a group catalog containing almost 1500 groups in the redshift range 0.1  ≲  z  ≲  1. Other major group catalogs at redshift z ≳ 0.3 are the one from the DEEP2 survey (Davis et al. 2003) containing ∼2400 groups (Gerke et al. 2005, 2012) in the redshift range 0.7  ≲  z  ≲  1.4 and the one from VVDS (Le Fèvre et al. 2005) containing ∼300 groups in the redshift range 0.2  ⩽  z  ⩽  1 (Cucciati et al. 2010), so the new zCOSMOS catalog is one of the largest published group catalogs at high redshift (and the largest on a contiguous field) and features very good statistics compared to the other group catalogs at high redshift in the literature. A special feature of our group catalog is the availability of group centers that are based on a sophisticated approach and the possibility to produce high-purity samples of central and satellite galaxies.

This paper is organized as follows. In Section 2.1, we describe the observational and mock data used for our work. In Section 3, we describe the method of group identification and the statistical results obtained using the mock catalogs. We then give a detailed description of the final zCOSMOS spectroscopic group catalog in Section 4 and perform some comparisons with the mock catalogs. In the second part of the paper, we first develop, in Section 5, the method for associating photo-z galaxies to the spectroscopically identified groups. We then discuss in Section 6 how this can lead to improved definitions of the corrected richness of the groups, of the most massive galaxies, of the spatial centers, and of the central galaxies, defined as those at the bottom of the potential well. The properties of the centrals and satellites will be explored in two further papers in preparation (C. Knobel et al. 2012, in preparation; K. Kovač et al. 2012, in preparation). In Section 7 we finally comment on the general difficulties in producing high-quality group catalogs, and in Section 8 we conclude the paper.

In the paper we will frequently make comparison with the set of 24 mock catalogs, which are 24 different realizations of a single model universe. When we apply a general algorithm to the mock catalogs, the scatter among the 24 returned values represents the minimum uncertainty that can be expected when we apply the algorithm to the actual data, due to issues such as cosmic variance. We refer to this as the standard deviation of the relevant parameter among the mock catalogs. It is this scatter that is appropriate when we wish to consider whether the real data are or are not consistent with the model universe of the mock catalogs. The best estimate of the overall performance of the algorithm in question obviously comes from the average of all 24 mock catalogs. The uncertainty in this estimate is given by the standard deviation above divided by $\sqrt{24}$. We will refer to this as the standard deviation of the mean.

Where necessary, a concordance cosmology with $H_0 = 70\ \rm {km\; s^{-1}\; \rm {Mpc}^{-1}}$, Ωm = 0.25, and $\Omega _\Lambda = 0.75$ is applied. All magnitudes are quoted in the AB system. We use the term "dex" to express the antilogarithm, i.e., 0.1 dex corresponds to a factor 100.1 ≃ 1.259.

2. DATA

In this section we describe the data that have been used for this paper. First, we give an overview of the zCOSMOS survey from which the spectroscopic redshifts are taken, then we describe the derivation of the photometric redshifts, masses, and absolute magnitudes using the photometry of the COSMOS survey, and finally we describe the construction of realistic mock galaxy samples.

2.1. The zCOSMOS Survey

zCOSMOS (Lilly et al. 2007, 2009; S. J. Lilly et al. 2012, in preparation) is a deep spectroscopic galaxy survey on the 1.7 deg2 of the COSMOS field (Scoville et al. 2007b) which utilized about 600 hr of ESO VLT service mode. It is divided up into two parts, "zCOSMOS-bright" and "zCOSMOS-deep." The former covers mainly the redshift range 0.1  ≲  z  ≲  1.2 and almost the entire COSMOS field, while the latter aims to cover the redshift range 1.5  ≲  z  ≲  3 on the central ∼1 deg2 of the COSMOS field.

The current work is entirely based on zCOSMOS-bright, which is now complete and contains spectra for about 20,000 objects taken using the VIMOS spectrograph (Le Fèvre et al. 2003) with a medium-resolution grism. The target catalog consisted basically of all objects within the magnitude range 15  ⩽  IAB  ⩽  22.5. Suspected stars were excluded. The slits were assigned to the targets such that for each mask the number of slit assignments on each of the four VIMOS quadrants was maximized—except for some X-ray and radio objects which were observed at high priority. Since there were two masks per pointing and the pointings were overlapping with centers differing by the size of a quadrant, there were finally eight passes for the central field, four at the borders, and two at the corners.

About 2% of all spectra come from "secondary" objects, i.e., objects that were potential targets which serendipitously ended up in slits targeted at other galaxies. They are not only very helpful for estimating the accuracy and verification rate of redshifts, but also compensate for the bias against close pairs due to slit constraints (de Ravel et al. 2011; Kampczyk et al. 2011). After removing less reliable redshifts (i.e., confidence classes 0, 1.1, 2.1, and 9.1; see Lilly et al. 2009) and spectroscopic stars, we end up with a high-quality redshift galaxy sample containing 16,776 objects within the area 149fdg47  ≲  α  ≲  150fdg77 and 1fdg62  ≲  δ  ≲  2fdg83. From multiply observed objects the spectral verification rate for this sample is about 99% and the redshift accuracy about 100 km s−1 which is sufficient to probe the cosmic group environment. The remaining objects and all those not observed spectroscopically have photo-z available. Henceforth we will refer to this sample of secure redshifts as the "20k sample."

The spatial sampling rate (SSR), i.e., the fraction of objects of the magnitude-limited target catalog whose spectra were observed, is a function of (α, δ) and is shown in Figure 1. According to the design of zCOSMOS there is a central region (α = 150fdg12 ± 0fdg54 and δ = 2fdg22 ± 0fdg46; see the black rectangle) with a SSR substantially higher than at the borders. Even in the central region, the SSR is not completely uniform, exhibiting some stripes due to the placement of slits in the masks. The redshift success rate (RSR) is the fraction of observed spectra that have yielded a reliable redshift. The RSR is mostly a function of apparent magnitude and redshift of the galaxies and only weakly dependent on color (see Figures 2 and 3 of Lilly et al. 2009).

Figure 1.

Figure 1. Spatial sampling rate (SSR) of the zCOSMOS 20k sample. The color bar indicates the SSR, which is computed in pixels of 1.5 arcmin. The black rectangle shows the central region for the 20k sample.

Standard image High-resolution image

Approximately, the SSR and RSR can be assumed to be uncorrelated so that by multiplying them we obtain for each galaxy the completeness with respect to an ideal magnitude-limited survey. The full zCOSMOS area has an average completeness of 48%, while for the central region it rises to 56%. For some applications it is useful to restrict the area of the survey to the central region where the sampling rate is highest. It should be noted that the redshift distribution of galaxies in the COSMOS field shows two prominent features at redshifts ∼0.35 and ∼0.7 (cf. Figure 1 of K09).

2.2. Photometric Redshifts

Photometric redshifts (photo-z), masses, and absolute magnitudes were derived from spectral energy distribution (SED) fitting using ZEBRA+ (Oesch et al. 2010), which is an extension of ZEBRA (Feldmann et al. 2006), to allow for the derivation of physical properties of the galaxies using stellar population synthesis models.

The photo-z were derived from a fit of empirical templates to 26 photometric bands from u* (CFHT) to Spitzer IRAC4.8 including 12 broadband, 12 intermediate-band, and 2 narrowband filters. The empirical template set was based on Bruzual & Charlot (2003) models, to which emission lines were added, before running the template correction module of ZEBRA based on a random subsample of zCOSMOS spectroscopic redshifts. For the few hundred XMM-Newton X-ray sources the photo-z provided by Salvato et al. (2009) were taken (published in Brusa et al. 2010).

The stellar masses were subsequently derived from standard Bruzual & Charlot (2003) models with an initial stellar mass function of Chabrier (2003) and dust extinction according to Calzetti et al. (2000). Due to the absence of emission lines in the model SEDs, only the broadband photometry was used for the SED fit, where the redshift was fixed at the spec-z of the galaxy, if available, or otherwise at the adopted photo-z.

In order to increase the fidelity of our photo-z sample, we excluded 5% of the objects by applying a cut in the resulting χ2 from the SED fit and required that for each object at least nine broadband filters were available. Comparison with the spectroscopic control sample yielded a photo-z error of about 0.01(1 + z) and a catastrophic failure rate of 2%–3% where catastrophic failure is defined by |zspeczphot| > 0.04(1 + z). (The subsample that was excluded had catastrophic failure rate of ∼60%.) We compared our stellar masses to those derived using Hyperzmass (see Bolzonella et al. 2010) which yielded an uncertainty in stellar mass of about 0.2 dex. Note that the stellar masses were derived without considering mass return in the sense that "stellar mass" is simply the integral of the star formation rate, since this is more useful for most purposes. These masses are typically 0.2 dex larger than when considering mass return.

2.3. Mock Catalogs

The mock catalogs that are used for tuning the group-finding parameters and for comparing our results with cosmological simulations are adapted from the COSMOS mock light cones (Kitzbichler & White 2007) which are based on the Millennium DM N-body simulation (Springel et al. 2005) run with the cosmological parameters Ωm = 0.25, $\Omega _\Lambda = 0.75$, Ωb = 0.045, h = 0.73, n = 1, and σ8 = 0.9. The semi-analytic recipes for populating the DM halos with galaxies are that of Croton et al. (2006) as updated by De Lucia & Blaizot (2007). There are 24 independent mock catalogs, each covering an area of $1.4\ \rm {deg} \times 1.4\ \rm {deg}$ with an apparent magnitude limit of r  ⩽  26 and a redshift range of z  ≲  7.

The mock catalogs were adjusted to resemble as closely as possible the actual 20k sample. For details we refer to K09. After applying a magnitude cut the mean number of galaxies in the mock catalogs (averaged over all 24 fields) are slightly different from the number of galaxies in the zCOSMOS target catalog (a 1σ–2σ effect). Since the density of galaxies is important for tuning the group-finding parameters, we applied a small adjustment, uniform across all mock catalogs and smoothly varying in redshift, to the magnitude limit for the mock catalogs so as to match the correct (smoothed) number of galaxies with redshift. This intervention has, however, only a very small effect on the analysis in this paper and we usually checked that our results did not depend sensitively on this alteration. We then applied the SSR and RSR to the mock catalogs by randomly removing galaxies from the magnitude-limited mock sample and implemented a Gaussian redshift measurement error of δz = 100(1 + z)/c km s−1.

For the second part of the paper, we extend the spectroscopic 20k mock samples by adding simulated photo-z galaxies so that the spec-z and photo-z mock samples add up to the IAB  ⩽  22.5 complete samples for each mock catalog. That is, each galaxy brighter than the flux limit that is not part of the spec-z mock sample was assigned a photometric redshift by perturbing its original redshift by an amount drawn from a Gaussian distribution with standard deviation δz = 0.01(1 + z). We also perturbed the stellar masses of all galaxies, spec-z as well as photo-z, by adding a Gaussian random number with standard deviation of 0.2 to log (M/M) to mimic the stellar mass uncertainty of 0.2 dex of the actual data.

3. GROUP-FINDING METHOD

In this section, we describe the method of group identification and provide the resulting group catalog statistics as obtained with the mock catalogs. We will slightly modify the methods presented in K09 to optimize them for the 20k sample. A novelty of the 20k group catalog is the existence of a larger number of relatively rich groups with N  >  12, so that the optimization strategy has to be adapted to yield stable statistics for these higher richness classes as well. The application to the zCOSMOS 20k sample is presented in the next section.

3.1. Definitions

We will mainly follow the terminology and statistics introduced in Section 3.2 of K09 which shall be briefly summarized in the following. For details we refer to K09.

A group is defined as the set of galaxies occupying the same DM halo.23 In the mock catalogs we know exactly which galaxies are in which groups and we denote the corresponding sets of galaxies as the "real groups." On the other hand, the set of groups obtained by running a groupfinder on actual or mock data is called "reconstructed groups." The aim of group-finding is to tune the parameters of the groupfinder so that the resulting catalog of reconstructed groups approaches as closely as possible the catalog of real groups, as measured by certain statistics. It should be stressed that the "real" groups correspond to those DM halos which would be "detectable" (i.e., which host at least two galaxies with spectroscopic redshift measurements) in a galaxy survey with the same characteristics as zCOSMOS. Figure 2 shows the fraction of these detectable DM halos as compared to the overall sample of all DM halos. Note that more than 90% of DM halos of mass >1013.5M are detectable up to a redshift of z ≃ 0.8, while, for groups more massive than 1012.5M, the completeness decreases linearly with redshift from ∼90% at z ≃ 0 down to ∼10% at z ≃ 0.9.

Figure 2.

Figure 2. Fraction of detectable halos in the zCOSMOS 20k mock samples, as a function of redshift, where detectable corresponds to having at least two members with spectroscopic redshifts above IAB = 22.5 after the spacial sampling and spectroscopic success rate are applied. The lines (from bottom to top) correspond to groups more massive than 11, 11.5, 12, 12.5, 13, 13.5, and 14, respectively, in units of log (M/M). The shaded area is the standard deviation among the 24 mock catalogs.

Standard image High-resolution image

With the concepts of the "real" and "reconstructed" groups, we can define the "completenesses" and "purities" of samples of reconstructed groups by associating the real groups to reconstructed groups and vice versa. A real (reconstructed) group is associated with a reconstructed (real) group if the former contains more than 50% of the members of the latter. All such associations are called "one-way-match" (1WM). If the association is mutual then we also call it a "two-way-match" (2WM; see Figure 3 of K09 for illustration). 2WM are thus 1WM.

In K09 we demonstrated that the statistics of the group catalog can strongly depend on the richness N, which is the number of observed spectroscopic members for a given group, and we introduced the multi-run scheme to overcome this. To check this aspect of our catalog, we investigate the statistics as a function of N in what follows. It should be noted that N will be biased with respect to redshift, since it refers to galaxies above the survey flux limit, and it is also affected by the local sampling rate. Hence, the richness N is a parameter that describes the identification of a group and the amount of information about it, rather than the actual number of galaxies that reside in it. To obtain an estimate of the actual number of members, unbiased with respect to redshift, the corrected richness (see Sections 4.2 and 6.1) should be used defined in terms of a volume-limited galaxy sample.

The one-way completeness c1(N) is then defined as the number of 1WM of real groups of richness N to reconstructed groups of any richness divided by the number of real groups of richness N. Note that in K09 we defined these quantities in a cumulative way, i.e., always for ⩾N; here we define them as functions of N only. The two-way completeness c2(N) is similarly defined by considering 2WM instead of 1WM. Similarly, the one-way purity p1(N) is defined by the number of 1WM of reconstructed groups of richness N to real groups of any richness normalized by the number of reconstructed groups of richness N, and the two-way completeness p2(N) is obtained by exchanging 1WM with 2WM. While these statistics are made on a group-by-group basis, there are analogous statistics, referring to individual galaxy memberships in groups, which are the galaxy success rate Sgal(N) in correctly assigning group membership to galaxies and the interloper fraction fI(N) which gives the fraction of non-group galaxies that are incorrectly assigned to groups.

In addition to these statistics we also introduced in K09 the figures of merit g1 and g2:

Equation (1)

Equation (2)

They are defined such that they are numbers in the interval between 0 and 1. g1(N) is a measure of the balance (or trade-off) between 1WM completeness and purity, and g2(N) is a measure of the balance between fragmentation and overmerging of reconstructed groups. For a good group catalog, g1 should be close to zero and g2 close to one for all ranges of richnesses. In this paper, we introduce another figure of merit

Equation (3)

which is similar to g1 except that all 1WM statistics are replaced by their 2WM statistic counterparts.

We remind readers that these statistics compare the reconstructed group catalog to the real group catalog, i.e., to the groups that are in principle detectable within zCOSMOS.

3.2. Optimization Strategy

The basic group-finding algorithms we apply are the FOF and VDM algorithms that were described in Section 3.1 of K09. The main task is to optimize the group-finding parameters such that the resulting catalog exhibits the best possible statistics. For the 10k sample the group-finding strategy was mainly driven by minimizing g1(N) for several richness classes. However, since g1(N) is only based on 1WM statistics, it does not account for fragmentation or overmerging in the resulting catalog. Thus, if optimized for g1(N) the resulting catalog might contain, unnecessarily, many such overmerged or overfragmented groups which will exhibit very good one-way statistics but very poor two-way statistics. A reconstructed group that is fragmented or overmerged will fail to tell us anything about the true nature of the group such as its mass, richness, or radius. It will only tell us if a certain galaxy is a group galaxy or not. Therefore, the number of such groups should be kept as low as possible. This is why we decided in the present work to optimize the parameters for the modified $\tilde{g}_1(N)$ instead of g1(N).

Optimizing the single-richness runs with respect to $\tilde{g}_1$ instead of g1 will, of course, yield slightly worse g1 values for the single runs. This, however, does not have to be true for the g1(N) statistics of the global multi-run catalog. The combination of several single runs with inferior g1 statistics can lead to a multi-run catalog with slightly superior g1 for small N than the multi-run catalog of the single g1-optimized single runs. This seeming paradox is resolved by noting that, in a multi-run scheme, the single runs can interfere in a complicated nontrivial way. For instance, if the first run being optimized for large groups aims to produce a very complete catalog, it will lead to some overmerging of some parts of small groups, which cannot then be detected in later runs. As a result, the first run can already spoil the g1 statistics of the small groups.

How can the parameters of the single runs be optimized in order to produce an optimal multi-run catalog? This is probably the most difficult part in the overall group-finding procedure and, unfortunately, there is no general prescription in order to produce "the" unique optimal multi-run catalog. In principle, one would have to analyze the statistics of the multi-run catalog for all possible parameter combinations of the single runs. This would not only be computationally very expensive, but would also require a distinct single figure of merit for characterizing a whole catalog.24

Thus, a manageable way of producing an optimized multi-run catalog is to first produce a couple of optimized single runs and then try different combinations, always keeping an eye on $\tilde{g}_1(N)$ and the number of reconstructed groups Nrec(N). As a guideline, the parameters of the single runs which are to be combined to the multi-run should not exhibit any large discontinuities as a function of richness. That is, the parameters of the multi-run should be slowly varying as we move down to smaller and smaller groups.

While this approach works pretty well for FOF, it is less convenient for the VDM parameters because their effect on the final catalog statistics is much harder to anticipate intuitively and it is even harder to anticipate the effect of different combinations of single runs. The final parameter sets for the FOF and VDM 20k multi-run catalogs are given in Tables 1 and 2, respectively. Note that the justification for these particular parameter sets is based only on the extremely good statistics of the final product (see Section 3.4) and not by any rigorous optimization procedure. Moreover, we have also checked that the application of these group-finding parameters on the actual data yield consistent behavior between the actual data and the mock catalogs, e.g., in the number of 1WM between FOF and VDM (cf. Figure 7 of K09).

Table 1. Multi-run Parameter Sets for FOF

Step Nmin Nmax b lmaxa R
        (Mpc)  
1 11 500 0.1 0.375 18.5
2 7 10 0.095 0.38 14.5
3 6 6 0.09 0.35 16
4 5 5 0.085 0.375 13.5
5 4 4 0.075 0.3 19.5
6 3 3 0.09 0.275 18.5
7 2 2 0.06 0.225 16.5

Note. aPhysical length.

Download table as:  ASCIITypeset image

Table 2. Multi-run Parameter Sets for VDM

Step Nmin Nmax RI LI RII LII r l
      (Mpc) (Mpc) (Mpc) (Mpc) (Mpc) (Mpc)
1 9 500 0.7 12 0.7 10 0.7 10
2 5 8 0.7 12 0.4 8 0.5 8
3 2 4 0.4 8 0.4 8 0.5 7

Note. All units of lengths are comoving.

Download table as:  ASCIITypeset image

Looking at Figure 1, one might be tempted to introduce a spatially variable linking length to account for the variations in the projected density of galaxies caused by variations in the SSR. We carried out tests of this by implementing, for example, sinusoidally varying linking lengths along the right ascension axis that produced slightly larger values in underdense strips than in the overdense strips in Figure 1. Interestingly, our optimization scheme preferred a non-varying linking length. The reason for this is that, except at low redshifts z  ≲  0.3, the FOF linking length l is set by the maximum linking length Lmax (see K09), which is introduced to be of the order of the expected physical size of the DM halos, and thus independent of the local galaxy density. This is also demonstrated if we allow a general functional form for the redshift dependence of the linking length l(z). The preferred redshift dependence, in terms of optimizing the statistics of the group catalog, is a linking length that is basically constant in physical space, even though the density of galaxies drastically decreases with redshift. This is also seen in the fact that the statistics of the group catalog are very similar whether we consider the full COSMOS field or only the central region (see Table 3).

Table 3. Statistics of the 20k Mock Group Catalogs for Different Observed Richness Ranges N

  N = 2 3  ⩽  N  ⩽  4 5  ⩽  N  ⩽  9 N  ⩾  10
Full field and full redshift range        
c1 0.69 ± 0.02 0.84 ± 0.03 0.83 ± 0.04 0.84 ± 0.06
c2 0.62 ± 0.02 0.76 ± 0.03 0.77 ± 0.04 0.80 ± 0.08
p1 0.69 ± 0.02 0.82 ± 0.02 0.83 ± 0.04 0.84 ± 0.06
p2 0.63 ± 0.02 0.74 ± 0.03 0.75 ± 0.04 0.78 ± 0.06
Sgal 0.70 ± 0.02 0.80 ± 0.02 0.84 ± 0.02 0.87 ± 0.02
fI 0.30 ± 0.02 0.22 ± 0.02 0.17 ± 0.02 0.15 ± 0.02
Central region and 0.1 < z < 0.8        
c1 0.72 ± 0.02 0.85 ± 0.03 0.84 ± 0.05 0.86 ± 0.06
c2 0.65 ± 0.02 0.78 ± 0.04 0.78 ± 0.05 0.81 ± 0.06
p1 0.72 ± 0.02 0.82 ± 0.03 0.85 ± 0.04 0.83 ± 0.05
p2 0.64 ± 0.02 0.73 ± 0.03 0.77 ± 0.04 0.78 ± 0.07
Sgal 0.73 ± 0.02 0.81 ± 0.03 0.85 ± 0.03 0.88 ± 0.02
fI 0.27 ± 0.02 0.22 ± 0.02 0.16 ± 0.02 0.14 ± 0.02

Notes. The numbers refer to the mean and the error bars to the standard deviation among the 24 mock catalogs.

Download table as:  ASCIITypeset image

As discussed and implemented in K09, a much more important effect is that the optimal linking length l depends on richness. This motivated our multi-run scheme. As shown in Table 1 the linking lengths for the seven different runs in this scheme differ by up to 50%.

3.3. Subcatalogs

As in K09, we take the FOF multi-run group catalog to be the main group catalog and use the VDM multi-run catalog to define the galaxy purity parameter, GAPi, for i ∈ {1, 2} as follows: if an FOF group galaxy is also in a VDM group such that there is a 1WM between the FOF and the VDM group, the GAP1 of this galaxy is set to 1, and to 0 otherwise. Similarly, if there is a 2WM between these groups, then the GAP2 is 1, and 0 otherwise.

This concept can be generalized to a group as a whole by computing the fraction of members of a given group that have a GAP ≠ 0. We define the group purity parameter, GRPi, i = {1, 2} of a group to be the fraction of galaxies in that group that have GAPi = 1. By selecting those groups with a GRPi larger than some threshold, we generate subcatalogs of the original FOF group catalog with higher purity, as shown in the next paragraph. The subcatalog consisting of all groups with GRPi > 0 excludes groups that are only detected in FOF. We call this the GRPi subcatalog.25

It turns out that the statistics of the basic FOF catalog and its GRP1 subcatalog are very similar. Consequently we omit the latter in the following, including instead just the GRP2 subcatalog.

3.4. Catalog Statistics for the Mock Catalogs

The global properties of the 20k mock group catalogs are summarized in Table 3 and in Figures 36. If the pairs are excluded, the full catalogs exhibit a completeness c1 ≳ 83% and a purity p1 ≳ 83% for any richness. If we restrict the sample to the central region and to the redshift range 0.1 < z < 0.8, where most groups are, the completeness for these groups even rises to c1 ≳ 85%, while the purity remains about the same as before.

Figure 3.

Figure 3. Cumulative statistics of the mock group catalogs as a function of observed richness N. The mean for the 20k FOF group catalogs is shown by the red lines, for the 20k GRP2 group catalogs by the green lines, and for the FOF 10k group catalogs by the black lines. Upper left panel: the solid lines correspond to g1 and the dashed lines to $\tilde{g}_1$. Upper right panel: interloper fraction fI. Middle panels: the solid lines correspond to c1 (left) and p1 (right) and the dashed lines to c2 and p2, respectively. Lower left panel: galaxy success rate Sgal. Lower right panel: goodness g2. In all panels, the error bars refer to the standard deviation of the mean. For the sake of clarity they are only shown for the 20k FOF catalogs.

Standard image High-resolution image
Figure 4.

Figure 4. Statistics for the FOF mock group catalogs as function of redshift. The four panels show different richness classes as indicated by the labels. The solid curves indicate the completeness c1 (blue), purity p1 (red), galaxy success rate Sgal (green), and the interloper fraction fI (black). The dashed lines correspond to c2 (blue) and p2 (red). The error bars are only shown for c1, p2, and fI for clarity and correspond to the standard deviation among the 24 mock catalogs. The robustness of the catalog statistics over most of the redshift range is clear.

Standard image High-resolution image
Figure 5.

Figure 5. Number of groups Ngr as a function of observed richness N. The upper panel shows the absolute number of groups and the lower panel the relative number compared to the number of real groups within the mock catalogs. Shown are the 20k FOF catalog (red solid line), the 20k GRP2 catalog (red dashed line), and the mean of the 20k FOF mock catalogs (blue line). The error bars indicate the standard deviation among the 24 mock catalogs, and the gray shaded area corresponds to the sample variance for the real 20k mock groups.

Standard image High-resolution image
Figure 6.

Figure 6. Portions of the different possible associations between reconstructed and real groups for the 20k mock catalogs as a function of observed richness N. The bottom layer shows the fraction of reconstructed groups having 2WM to real groups. The surface of this layer is equal to p2(N). The dark gray layer shows the portion of 1WM (which are no 2WM) from reconstructed groups to real groups ("fragmentation") and the light gray layer the corresponding portion from real groups to reconstructed groups ("overmerging"). The white area corresponds to the portion for which no association exists ("spurious groups"). The dashed line is a benchmark at 0.7.

Standard image High-resolution image

In Figure 3 the cumulative statistics of the 20k FOF mock catalogs are shown (red line) and compared those of the 10k mock catalogs (black line) and the 20k GRP2 mock subcatalogs (green line), i.e., all groups with a GRP2 > 0. From the g1-panel it is clear that compared to the 10k catalogs the 20k catalogs constitute an improvement of about 5%–10% which is significant in terms of the statistical error of the mean of the 24 mock catalogs which presumably reflects the range of things (such as overlapping groups, spatial distribution of galaxies in groups, etc.) which can influence the purity and completeness. This superiority is less obvious from a glance at the completeness and the purity (middle panels). For N ≳ 10, the completeness of the catalogs, both c1 and c2, are similar while the purity of the 20k catalogs is slightly higher and the overall line is much more uniform over a broad range for N. For N  ≲  10 the completeness of the 10k catalogs is higher, but this deficiency is more than balanced by the improved purity of the 20k catalogs. The trends of the galaxy success rate Sgal and the interloper fraction fI are similar between the 10k and 20k. The 20k catalogs have significantly less interlopers for all N.

Overall, the 20k mock group catalogs are generally purer than the 10k mock catalogs. In fact, they are so pure that, as already noted above, there is almost no difference between the FOF and the corresponding GRP1 subcatalogs. As expected, the GRP2 catalogs are even purer than the FOF ones, but at the expense of completeness. While the g1 goodness of the GRP2 catalog is worse than that of the FOF catalogs, the g2 goodness is better for groups N  ≲  10. Thus, selecting only the GRP2 groups slightly diminishes the contamination from overmerging and fragmentation.

The catalog statistics as a function of redshift are shown in Figure 4 for different richness classes. It is clear that all statistics are fairly robust over the whole redshift range for any richness of group, as was already demonstrated for the 10k sample (cf. Figure 9 of K09). Only at the very high redshift end z > 0.8 and for the smallest groups is a weak redshift dependence apparent.

The superiority of the 20k catalogs over the 10k catalogs can, however, only be partially assessed by Figure 3. One of its major successes is that the new catalogs correctly reproduce the number of groups as a function of richness N. Figure 5 shows the relative abundance of the reconstructed groups Nrec (lower panel blue line) compared to real groups Nreal in the mock catalogs. It is clearly seen that the mean number of reconstructed groups follow extremely well the number of real groups for all N. Even the scatter in the reconstructed groups among the 24 mock catalogs is well within the sample variance of the real groups. Note that in K09, it was the 1WM subcatalogs that had this property, while the basic FOF multi-run catalogs contained rather too many groups for small N (see Figure 6 of K09).

4. THE SPECTROSCOPIC GROUP CATALOG

The group catalog produced with the actual zCOSMOS 20k sample is given in Tables 4 and 5. The first table provides a list of all groups along with their properties and the second the corresponding group galaxy sample containing the spectroscopic and photometric group population. For the construction of the photometric group population, we refer to Section 5. In the following, we will call the actual zCOSMOS FOF group catalog just the "20k group catalog." The positions of the 20k groups in redshift space are shown in Figure 7.

Figure 7.

Figure 7. Positions of the zCOSMOS 20k groups in redshift space. The groups are plotted as a function of right ascension and comoving distance, where the richness N of the groups is color coded as indicated above the cone. The labels on the left side of the cone indicate the redshift and the ones on the right side the corresponding comoving distance. Note that the transverse scale of the cone has been stretched by about a factor of two for clarity. In reality, the comoving depth of this cone (from z = 0.1 to 1) is about 70 times longer than its transverse comoving size at z = 0.5. The comoving transverse scale of the cone is indicated by the horizontal bar at the top. The clustering of the groups and the cosmic large-scale structure are clearly visible up to the highest redshifts.

Standard image High-resolution image

Table 4. The zCOSMOS 20k Group Catalog (Excerpt)

Group ID Na Ncorrb αgrc δgrc zgrd rfudgee $\hat{\sigma }$f $\log \left(\frac{M_{\rm fudge}}{M_\odot }\right)$g GRP2h
      (deg) (deg)   (Mpc) (km s−1)    
0 14 33 150.02209 2.01328 0.0787 0.646 433 13.51 0.93
1 30 54 150.35758 2.44265 0.1230 0.652 454 13.56 0.63
2 33 52 149.86613 1.76547 0.1245 0.674 587 13.52 0.61
3 14 28 150.42153 2.44418 0.2160 0.532 298 13.45 1.00
4 14 97 150.20008 1.65232 0.2202 0.722 1008 13.69 0.93
5 17 36 150.10545 2.36170 0.2201 0.577 745 13.44 0.94
6 20 27 150.45635 2.68079 0.2186 0.515 642 13.42 0.95
7 17 28 150.04641 2.43245 0.2200 0.532 662 13.40 0.71
8 15 16 150.23142 2.55729 0.2199 0.627 418 13.49 0.87

Notes. aNumber of spectroscopic members. bCorrected richness with respect to the flux limit (see Section 6.1). cImproved group centers defined in Section 6.3. dMean redshift of the spec-z group members. eFudge radius in physical Mpc (see Section 4.2). fVelocity dispersion for groups with N  ⩾  5 (see Section 4.2). gFudge mass for the DM halo (see Section 4.2). hGroup purity parameter (GPR2) (see Section 3.3).

Only a portion of this table is shown here to demonstrate its form and content. A machine-readable version of the full table is available.

Download table as:  DataTypeset image

Table 5. Spec-z and Photo-z Group Galaxies (Excerpt)

Galaxy ID Group ID 20k Flaga GAP2b α δ zc log (M*/M)d pe pMf pMAg
        (deg) (deg)          
819041 0 1 1 149.99837 2.03514 0.0789 9.33 0.92 0.00 0.00
818934 0 1 1 150.02406 1.96865 0.0779 8.30 0.89 0.00 0.00
818888 0 1 1 150.03653 2.02487 0.0794 7.85 0.96 0.00 0.00
819026 0 1 1 150.00038 1.97859 0.0802 10.05 0.90 0.00 0.00
818839 0 1 1 150.04871 2.07792 0.0775 8.17 0.82 0.00 0.00
819133 0 1 1 149.96812 2.06726 0.0779 8.17 0.80 0.00 0.00
819032 0 1 0 149.99948 1.98699 0.0805 10.22 0.92 0.00 0.00
819060 0 1 1 149.99123 1.99116 0.0797 8.11 0.91 0.00 0.00
818935 0 1 1 150.02393 2.07273 0.0779 7.97 0.85 0.00 0.00
818815 0 1 1 150.05394 2.03343 0.0785 8.47 0.91 0.00 0.00
819118 0 1 1 149.97241 2.10540 0.0781 7.99 0.71 0.00 0.00
819104 0 1 1 149.97723 2.00483 0.0779 10.16 0.89 0.00 0.00
818982 0 1 1 150.01341 2.02956 0.0791 10.70 0.96 0.16 0.96
818787 0 1 1 150.06047 2.00672 0.0785 10.48 0.91 0.01 0.00
700213 0 0 −1 150.07021 1.85821 0.1029 8.19 0.19 0.00 0.00
700241 0 0 −1 149.98257 1.80462 0.0964 7.87 0.09 0.00 0.00

Notes. a1 if spec-z is available, otherwise 0. bGalaxy purity parameter for spec-z members (see Section 3.3), −1 for photo-z members. cSpec-z if available, otherwise photo-z. dStellar mass (computed without considering mass return, see Section 2.2). eAssociation probability (see Section 5.1). fProbability to be the most massive (see Section 6.2). gSee Section 6.4.

Only a portion of this table is shown here to demonstrate its form and content. A machine-readable version of the full table is available.

Download table as:  DataTypeset image

The basic properties of the 20k group catalog are summarized in Table 6 and compared to the 10k catalog. For N  ⩾  2, the 20k catalog contains 1496 groups, almost twice as many as the 10k catalog, while it has four times as many groups with N  ⩾  10.

Table 6. Basic Statistics for the zCOSMOS 10k and 20k Group Catalogs

  10k 20k
  Ngra $f_{\rm GRP_1}$b 〈GRP1 Ngra $f_{\rm GRP_2}$b 〈GRP2
N = 2 514 0.79 0.79 932 0.89 0.89
3  ⩽  N  ⩽  4 184 0.81 0.77 374 0.91 0.89
5  ⩽  N  ⩽  9 91 0.95 0.87 151 0.87 0.81
N  ⩾  10 11 1.0 0.93 41 0.90 0.78

Notes. aNumber of groups. bFraction of groups in the corresponding GRPi, i ∈ {1, 2}, subcatalog.

Download table as:  ASCIITypeset image

4.1. Group Robustness

One of the main prerequisites for estimating the properties of reconstructed groups is the fact that the group is reliably identified. If the group is overmerged or fragmented, the derived properties such as mass or physical size will be severely affected and may have little or nothing to do with those of the real group. For reconstructed groups that do not have a 2WM to their real groups, we cannot even, in general, perform a unique one-to-one comparison between the properties of real groups and those of the reconstructed groups. This again emphasizes the importance for a group catalog to be not only optimal with respect to the one-way statistics c1 and p1, but also regarding the two-way statistics c2 and p2.

Figure 6 shows the fractions of groups as a function of observed richness N that have the four different kinds of possible associations: full 2WM, a 1WM from reconstructed to real (i.e., fragmentation), a 1WM in the opposite direction (i.e., overmerging), and no association at all. The percentage of reconstructed groups exhibiting a 2WM to real groups is ≳ 75%. Of the remaining ∼25%, the fraction of overmerged groups is higher than that of fragmented groups. It should also be noted that for groups with N ≳ 5 there are almost no spurious groups. That is, essentially every group that is found constitutes a real physical structure in the universe, but in 20%–30% of cases, the groupfinder has made it significantly too small or too big (by a factor of more than two in membership) compared to the real group. The fact that Figure 6 is basically independent of N is a consequence of the application of the multi-run scheme (see Section 3.2).

Since the FOF groups depend solely on the two quantities lper and lpar, which are the linking lengths perpendicular and parallel to the line of sight, respectively, a natural question is whether a given group is sensitive to the particular choice of these linking lengths, or whether slightly different values would not significantly alter the resulting group? To answer this question we have introduced a "group robustness" parameter, frob(f) for each group, by running the groupfinder with the linking lengths $f\,{\bm \cdot}\, l_{\rm per}$ and $f \,{\bm \cdot}\, l_{\rm par}$, parameterized by the scale factor f, and computing for each group the fraction

Equation (4)

where N(f) is the new richness of that group. This assures us that frob(f) takes only values between 0 and 1 and that the robustness increases for higher frob(f), with frob(f) = 1 being a highly robust structure. frob(f) is a measure of how sensitive the implied membership is to changes in the linking length. For f < 1 it probes the robustness with respect to fragmentation and for f > 1 with respect to overmerging.

Figure 8 shows the results for f = 0.5 and f = 2 for 20k FOF mock groups in the richness classes 5  ⩽  N  ⩽  9 (red lines) and N  ⩾  10 (black line). These results are not sensitive to the precise value of f. The upper panels exhibit the p2 statistics for the corresponding frob(f) selected subsamples. Reducing the linking length tends to have a bigger effect than increasing it. Roughly 50% of groups in the mock catalogs lose half of their members when the linking lengths are halved, but only 25% of groups double their memberships when the linking lengths are doubled. As would be expected, the overall purity increases strongly as the robustness frob(f) approaches unity, for both f being larger and smaller than one, and groups whose membership is stable to changes in the linking lengths are, not surprisingly, likely to be the purest. However, the lower panels make it clear that raising the purity of subsamples significantly by applying cuts in the robustness comes at the expense of losing many groups.

Figure 8.

Figure 8. Purity and group fraction as a function of group robustness. The solid lines show the median and the error bars the upper and lower quartiles of the 24 20k mock catalogs. The dashed lines are the results for the actual 20k group catalog. Red lines correspond to the richness class 5  ⩽  N  ⩽  9 and the black lines to N  ⩾  10.

Standard image High-resolution image

For the actual 20k group catalog the group fraction is shown by the dashed lines in Figure 8. Particularly the big groups are significantly less robust with respect to fragmentation than the corresponding mock groups. We do not know the reason for this but it matches other properties of the 20k group catalog such as the lack of high richness groups (see Section 4.3). Note that in contrast to the completeness and purity, the group robustness is one of the few quantities that can be computed using the actual data without the need for mock catalogs and thus allows a direct comparison with simulated data.

4.2. Estimates of Physical Properties

As pointed out in K09, we are able to estimate the velocity dispersion σv for groups with N  ⩾  5 and σv ≳ 350 km s−1 to an accuracy of about 25%. On the other hand, a reliable estimation of dynamical mass by means of the virial theorem has proved to be very difficult, not only because the error of the velocity dispersion enters the virial theorem quadratically, but also because reliable estimates of the virial radius are very hard to obtain. Using the mock catalogs, we found that the projected apparent extension of a group hardly correlates at all with the virial radius of the corresponding DM halo. The unavailability of reliable dynamical mass estimates is one major shortcoming of our group catalog and others constructed in similar ways. To have at least an idea of the typical mass of the groups, we introduced in K09 the so-called fudge mass by taking the corrected richness $\tilde{N}$ of the group (i.e., observed richness N corrected for SSR and RSR) at a given redshift z as a proxy for its mass.

In the same spirit we can define "fudge quantities" Qfudge for any quantity Q that at a given redshift exhibits a correlation to the corrected richness $\tilde{N}$ or to another quantity $\tilde{Q}$ which is independently measurable (e.g., velocity dispersion, projected extension). That is, a group at redshift z with corrected richness $\tilde{N}$ and with the measured property $\tilde{Q}$ can be assigned a corresponding Qfudge defined by

Equation (5)

where the brackets 〈〉 denote the average considering all reconstructed groups with 2WM to real groups for which the corresponding measured quantities are within some range of $\tilde{N}$, z, and $\tilde{Q}$, and Qmock denotes the correct group property of the corresponding real group.

In addition to the fudge mass we have computed fudge estimates for the halo virial velocity ("fudge velocity") and halo radius ("fudge radius"). For the fudge velocity we have used the apparent velocity dispersion σv as $\tilde{Q}$ and for the fudge radius, we use the apparent projected size of the group, as defined below. The scatter of the estimated quantities compared to the true quantities for reconstructed 20k mock groups exhibiting a 2WM to real groups is given in Table 7. As expected, the errors decrease with increasing observed richness N. Note that the fudge quantities must not be mistaken for real physical estimates of the corresponding quantity; rather they are "typical" values calibrated using the mock catalogs.

Table 7. Errors for the Fudge Quantities Using the 20k Mock Catalogs

Quantity Error N = 2 3 ⩽ N ⩽ 4 5 ⩽ N ⩽ 9 N ⩾ 10
Mfudge Δ dex 0.37 0.27 0.18 0.15
vfudge Rel. error 21% 19% 13% 9%
rfudge Rel. error 27% 23% 16% 11%

Download table as:  ASCIITypeset image

4.3. Number of Groups as a Function of N

The most straightforward way to compare the actual data with the mock data is by means of the number of reconstructed groups as a function of observed richness N. This is shown in Figure 5. Compared to the mock data the number of groups of the 20k group catalog is mostly within the range expected due to sample variance within the 24 mock catalogs. Since for the 20k mock catalogs the number of reconstructed groups traces very well the number of real groups for any richness, there is no need to distinguish between them.

The overall slope of the Ngr(N) function for the actual data, however, is steeper than that for the mock data. Particularly, the number of groups with two and three members is about 25%–50% higher than in the mock catalogs and for N ≳ 18 there is a significant lack of groups in the 20k sample compared with the mock catalogs. Both trends were already noted for the 10k sample in K09 and are now confirmed with the larger 20k sample. The excess of groups with N  ≲  3 cannot be blamed on the existence of secondary objects (serendipitous observations in the spectroscopic slits) which could boost the number of small groups since the fraction of such objects is only about 2%. Interestingly, a significant lack of high richness groups relative to the Millennium simulation has recently also been reported for the large GAMA FOF group catalog at local redshift (Robotham et al. 2011), which indicates that this lack is not a peculiarity of the COSMOS field.

It should be particularly noted that many of the individual mock catalogs contain groups that are much larger than those in zCOSMOS. While the largest group in the 20k sample has 33 members, there are on average about 3–4 groups with N  ⩾  40 per 20k mock catalog and 1–2 groups with N  ⩾  60. These huge groups are not an artifact of our group-finding algorithm, but are present as real groups in the mock catalog. Since the high-mass end of the halo mass function is very sensitive to the amplitude of the matter power spectrum in the universe, σ8, the large number of big groups in the mock catalogs could reflect the fact that σ8 = 0.9 for the Millennium simulation is too large relative to recent measurements of σ8 ≃ 0.82 ± 0.02 (e.g., Komatsu et al. 2011).

A direct measurement of σ8 by means of the group mass function is, however, very difficult, for two main reasons. First, it is the high-mass end of the mass function that is most sensitive to σ8 and where our catalog is most complete (see Figure 2). Due to the relatively small volume of zCOSMOS, we are in the regime of low number statistics for such high masses and thus are affected by cosmic variance, particularly at low redshift. Second, we checked that a mass cut by means of the fudge mass would introduce some mass-dependent systematics into the mass function estimation so that a robust estimation of σ8 would require improved mass estimates. However, the fact that there is no group in the 20k group catalog with Mfudge > 2 × 1014M, while there are ∼3.5 on average in each mock, certainly favors a low σ8. Only 3 out of the 24 mock catalogs (i.e., 12.5%) contain no group with that high fudge mass.

At this point it is interesting to come back to the findings on the group robustness in Section 4.1. We noted that for big groups the group robustness with respect to fragmentation is significantly lower than for the corresponding mock groups (Figure 8, black dashed lines). This points in the same direction as the detected lack of big groups. There are not only fewer big groups in the zCOSMOS group catalog than in the mock catalogs, but the observed groups are also less robust.

4.4. Fraction of Galaxies in Groups

A quantity closely related to the number of groups in a catalog is the fraction of galaxies that are in groups. Since the number of groups traces roughly the number of galaxies in zCOSMOS (cf. Figure 12 of K09), computing fractions of galaxies in groups instead of the absolute number of groups diminishes the effect of large-scale structure and associated cosmic variance. Measurement of this fraction allows further comparison with the mock catalogs and allows us to trace the buildup of the cosmic group environment over time. The analysis in this section will be entirely restricted to the central region of the zCOSMOS survey (see Figure 1).

The fraction of galaxies in groups for the full flux-limited 20k group catalog and the 20k galaxy sample is shown in Figure 9 as a function of redshift for N  ⩾  2 and N  ⩾  5. The overall behavior of the fraction of galaxies in the 20k groups (red line) matches quite well those of the reconstructed (or real) mock groups, at least in the redshift range z  ≲  0.6. At the highest redshifts, the fraction of group galaxies in zCOSMOS is significantly higher than in the mock catalogs. The reason for this is unclear. It may indicate a problem of the semi-analytic models in following the evolution of galaxies. Most of these highest redshift groups are only detected as pairs, leading to possible worries about the sampling of objects. However, the red dashed line corresponds to groups still detectable even if all secondary objects were discarded, so this is not the cause of this effect. Furthermore, it should also be noted that the excess is also visible for much richer systems (lower panel in Figure 9). It is noticeable (particularly for the lower panel) that the fraction of galaxies in groups is enhanced at the redshifts z ∼ 0.35 and z ∼ 0.70, where there are very large-scale structures in the COSMOS field (cf. Figure 1 of K09 and Figure 7 in this paper). At low redshift the total fraction of galaxies in groups is about 40%, which is consistent with the results from the low-redshift GAMA group catalog (Robotham et al. 2011), despite the different limiting fluxes of the survey, presumably reflecting the weak dependence of satellite fraction on galaxy mass.

Figure 9.

Figure 9. Fraction of galaxies in groups as a function of redshift for the whole flux-limited galaxy and group samples. The samples are restricted to the central region and show groups with N  ⩾  2 in the upper panel and N  ⩾  5 in the lower panel. The red solid line shows the fraction of galaxies in zCOSMOS 20k groups and the red dashed line the corresponding fraction if only groups are considered which are detectable without the existence of secondary objects. The black line shows the mean fraction of galaxies in real 20k mock groups and the green line the mean fraction of galaxies in reconstructed 20k mock groups. The error bars indicate the standard deviation among the 24 mock catalogs. The mock catalogs are in fair agreement with the actual data for z  ≲  0.6, but contain significantly too few groups for z ≳ 0.6.

Standard image High-resolution image

In order to get a clearer view into the buildup of the group environment over cosmic time, it is better to work with volume-limited samples of galaxies and groups, or as close approximations to such as can be constructed. We can approximate a volume-limited sample of galaxies by applying a cut in absolute magnitudes, chosen to evolve with redshift to deal, at least roughly, with the individual luminosity evolution of galaxies. We will apply the cut as

Equation (6)

for different absolute magnitude limits MB, lim. We performed the analysis with three magnitude limits MB, lim being −19.75, −20.25 and −20.75, respectively. The resulting galaxy populations are complete at least up to z ∼ 0.8.

To construct a volume-limited sample of groups we select all groups with at least two members brighter than MB, limz. We use the observed richness rather than the richness corrected for SSR and RSR to avoid the scatter that is introduced by potentially large completeness corrections. This procedure is not perfect. For instance, two galaxies may be linked at low redshift by others below the absolute magnitude cut, to form a "group" that would be undetected at high redshifts where the absolute magnitude limit is closer to the flux limit of the spectroscopic survey. This could lead to a redshift-dependent c1 and/or p2. However, Figure 4 shows that the redshift dependence of c1 and p1 is negligible over the redshift range considered here.

To address these and other concerns, Figure 10 shows the number of reconstructed mock groups compared with the number of all groups in the mock catalogs that host at least two bright galaxies, regardless of whether these groups are detectable within the 20k mock samples or not. The obtained completeness is therefore lower than that shown in Figure 3, where only the "detectable" groups were considered as the parent sample. The completeness computed in this way is found to be fairly constant in the redshift range 0.1  ≲  z  ≲  0.8 for all three absolute magnitude cuts. This reassures us that there are no strong systematic biases for the absolute magnitude selected groups as a function of redshift.

Figure 10.

Figure 10. Completeness of mock groups containing at least two members brighter than MB, limz. The lines correspond to the mean of the 24 mock catalogs for different absolute magnitude limits (blue: MB, lim = −19.75; red: MB, lim = −20.25; green: MB, lim = −20.75) and the error bars to the standard deviation of the mean. It is obvious that for the redshift range 0.1 < z < 0.8 the completeness for all three magnitude limits is fairly constant.

Standard image High-resolution image

Having established that our "volume-limited" samples should be free of bias, the fraction of galaxies in the groups is shown in Figure 11. For the 20k sample, we again see the signatures of the big structures at redshifts z ∼ 0.35 and z ∼ 0.70, as in Figure 9. This could indicate that the luminosity function of galaxies in groups is possibly environment dependent. Nevertheless, there is a clear overall trend for the fraction of (volume-limited) galaxies in (volume-limited) groups to significantly increase with decreasing redshift, as indicated by the dashed lines. This demonstrates the buildup of the cosmic group environment over a large fraction of the last 7 billion years. It should be noted that this result is insensitive to the precise form of the redshift correction in Equation (6).

Figure 11.

Figure 11. Fraction of galaxies in groups for volume-limited galaxy and group samples. The blue, red, and green solid lines show the results for the zCOSMOS 20k data (blue: MB, lim = −19.75; red: MB, lim = −20.25; green: MB, lim = −20.75). The error bars for the actual data are obtained by bootstrapping and the dashed lines exhibit linear fits to the data points. The gray lines are the corresponding mean curves of the 24 FOF mock catalogs, where the luminosity increases for the lower curves. The error bars exhibit the standard deviation of the mean. This figure demonstrates the buildup of the cosmic group environment over the redshift range 0.2 < z < 0.8.

Standard image High-resolution image

Curiously, the observed fraction of (bright) galaxies in the 20k groups is significantly higher than in the mock groups. This finding is independent whether the flux limit for the mock catalogs is adjusted or not (see Section 2.3). The fraction in the mock catalogs, however, approaches that of the 20k sample as we go to fainter galaxies at the flux limit (in agreement with Figure 9). This suggests that the cause of the discrepancy in Figure 11 could be a problem with the magnitudes of bright galaxies in the COSMOS mock light cones.

5. PHOTOMETRIC GROUP MEMBERS

For some applications it is very useful to have a complete galaxy sample down to a magnitude limit. For example, to study the most massive galaxies in groups it must be ensured that these galaxies are present in the sample. Since even the 20k sample is only complete to about 55%, the spectroscopic group catalog is not yet optimized for this kind of study. On the other hand, since zCOSMOS is performed on the COSMOS field which was followed in many wavelength bands, we would like to use all the available data to improve the group catalog which include high-quality photo-z catalogs for all galaxies in the COSMOS field down to IAB = 22.5. In this section we present our method of populating the spectroscopic groups discussed in the previous chapter by photo-z galaxies on a probabilistic basis.

Although there are in principal ways to detect groups in photometric galaxy samples (e.g., Li & Yee 2008; Gillis & Hudson 2011), we will only use the groups detected by spectroscopic galaxies. We will not use photo-z galaxies to detect new groups. Thus, our resulting group sample will be missing the population of all groups in the sky that do not have more than one spectroscopic member. Inspection of Figure 2 gives information on the fraction of groups that are missed for this reason since it plots the fraction of detectable halos (i.e., those with two or more galaxies above the zCOSMOS flux limit) that actually had two or more galaxies observed spectroscopically after the incomplete spatial sampling and redshift success rates are applied.

5.1. Assigning Probabilities to Photo-z Galaxies

Although the photo-z errors of ∼0.01(1 + z) are impressively small by normal standards, we cannot incorporate these galaxies into the group-finding scheme directly, or even unambiguously assign them to groups in a unique and reliable way. Some group galaxies might appear at large distance from the group center in redshift space and some galaxies could be candidates for several groups. However, we can attempt to quantify the probability that galaxies are associated with a given group. This probability will depend on the distance from the group center both in the plane of the sky and in the redshift dimension. We can again use the mock catalogs to determine these probabilities, similar to their use to fine-tune the group-finding algorithm. Additionally, the association probability may also depend on the luminosity or stellar mass of the galaxy in question. However, since this may depend on the galaxy evolution prescription in the COSMOS mock light cones and since one of our scientific goals is to use the group catalog to test such relations, we decided not to use this additional information in estimating association probabilities.

Suppose we have a group at (αgr, δgr, zgr) in redshift space and a nearby galaxy at (α, δ, z) with a redshift error of δz. We will parameterize the distance of the galaxy from the group by the scaled, dimensionless offsets perpendicular and parallel to the line of sight

Equation (7)

where r(α, δ, αgr, δgr, zgr) is the physical distance of the galaxy from the group center perpendicular to the line of sight and rgr is a measure of the projected physical extension of the group. A suitable group extension parameter rgr should ideally scale with the virial radius of the group and (αgr, δgr) should approach the center of the underlying DM halo. Since there are no unique estimators satisfying these requirements we will focus on different possibilities and discuss their relative strengths using the mock catalogs.

Regarding the group extension rgr, a natural estimator would be the root-mean-square (rms) extension of the spectroscopic members within the group, that is,

Equation (8)

with

Equation (9)

and Δαi = αi − αgr and Δδi = δi − δgr, where (αi, δi) is the position of the ith galaxy in the group and D(zgr) is the comoving distance to redshift zgr. Note that this estimator is still dependent on the choice of the group centers (αgr, δgr). The main drawback of this choice is its low correlation with the virial radius of the group in the mock catalogs. In fact, it proved to be very hard to estimate the virial radius from the distribution of galaxies. The second problem is based on the observation that particularly for groups with low richness N the scaling rrms can become unrealistically small because of chance orientation effects. Another approach for rgr is the fudge radius rfudge, which has the advantage of solving both drawbacks of rrms.

The estimators for the group centers are discussed in detail in Section 6.3. Some of the discussed estimators also use the photo-z information. As a benchmark for comparison we will often simply use the average over the positions of the spectroscopic group members which will be termed "standard centers." On the other hand, for the final computation of association probabilities, we have used "improved centers" (defined in Section 6.3), which are themselves based on association probabilities of photo-z galaxies. So the final probabilities are obtained by an iterative procedure which, however, already converges after one iteration.

Taking all reconstructed mock groups with 2WM to real groups, we then compute the fraction fr, σz, N) of photo-z galaxies which are members of the corresponding real group as a function of σr, σz, and N. To obtain large enough group samples for the computation of f, we restrict the richness dependence to just four richness classes, N = 2, 3  ⩽  N  ⩽  4, 5  ⩽  N  ⩽  9, and N  ⩾  10. For each galaxy and each group, the function fr, σz, N) is then evaluated and interpreted as the probability that this galaxy is a member of this group. Since the function f was estimated using only the reconstructed groups with 2WM to real groups, it does not include the effects of deficiencies in the original detection of the spectroscopic groups (cf. Figure 3). In other words, f is the probability that a galaxy is a member of an apparent group, defined as a certain location in (α, δ, z) space, to which should be multiplied the probability that the apparent group is actually real.

The functions fr, σz, N) are shown in Figure 12 for the four richness classes. They are all very similar and are smooth, strongly decreasing functions for increasing σr and σz. Not surprisingly, the probability of a galaxy being a member of the group is usually much larger than the formal integral of the redshift photo-z probability distribution for that galaxy over the very small redshift interval associated with the group, which is of course the motivation for this approach.

Figure 12.

Figure 12. Fraction fr, σz, N) of photo-z galaxies to be associated with groups, where σr is based on the fudge radius and the improved centers. The surface is the mean of f of the 24 mock catalogs and the error of the mean is indicated by the black bars. The function f was empirically computed using the reconstructed groups exhibiting 2WM to real groups (i.e., it does not include the effect of group detection failures) and is based on rfudge and the improved centers (see the text). For σr ∼ 3 or σz ∼ 3 the fraction f is basically zero.

Standard image High-resolution image

In this scheme, a galaxy can be associated with more than one group if it lies close enough to both of them, i.e., if fr, σz, N) is non-zero in either case. Indeed, the probabilities as computed in the previous paragraph may even sum up to more than unity. We therefore introduce a slight modification of the assigned probabilities. If a galaxy is associated with n groups with probabilities pi, i = 1, ..., n, we first compute the probability that it is not a member of any group

Equation (10)

Then the probability of the galaxy to be in any group is taken to be 1 − pnongr instead of ptot = ∑ni = 1pi. Finally, we just scale the probabilities by the ratio of these two, i.e.,

Equation (11)

For the ease of notation we will just write pi instead of $\tilde{p}_i$ in the following and refer to these quantities as "association probabilities."

5.2. Properties of the Association Probabilities

In the following, we will study the properties of the association probabilities introduced in the previous section in terms of fidelity and completeness for different group subsamples and different choices of the group extension rgr and group centers (αgr, δgr), and we will compare the distribution of probabilities in the mock catalogs to that in the actual data.

To investigate the fidelity of the association probability, we define a photo-z to be "successfully associated" with a reconstructed mock group with a 2WM to a real group ("2WM group"), if the photo-z galaxy is a member of the real group. For reconstructed mock groups with no 2WM to real groups ("non-2WM group"), i.e., the ∼30% reconstructed groups which are not in the bottom layer of Figure 6, the definition of a successful association is more subtle. If a non-2WM group is fragmented, a successful association is defined in the sense that the photo-z is a member of the real group to which our reconstructed group is associated. In the case of an overmerged reconstructed group the photo-z, there is more than one real group that is associated with our reconstructed group. Here a photo-z is successfully associated with the reconstructed group, if it is a member of the corresponding real group that contains the largest fraction of the members of our reconstructed group. For spurious groups, there is no corresponding real group and every photo-z is regarded as failed.

Figure 13 shows the fraction of successful associations as a function of probability p. The red line shows the success of associations for 2WM groups, and this should be a diagonal line because the probabilities were calibrated using these groups. The green line shows the result for those galaxies in 2WM groups which have non-zero association probabilities to more than one group and also looks satisfactory. The net result for the non-2WM groups is shown in blue. These curves are lower than the other curves because of the problems with group identification. The solid lines correspond to estimates of p based on the fudge radius and the improved centers, and the dashed lines correspond to estimates based on rrms and the standard centers. While the choice of the group extension rgr seems to have a negligible effect for those photo-z being associated with 2WM groups (red versus green lines), the fudge radius rfudge seems to work better for the "failed groups." The reason for this is that such groups sometimes have strange shapes so that rrms is far too large, which results in more (wrongly) associated galaxies than if rfudge was used. The fudge radius rfudge instead depends only on the richness and thus is unaffected by the shape of the group.

Figure 13.

Figure 13. Fraction of correct associations as a function association probability p for different richness classes. The lines show the mean of the fractions computed for each of the 24 mock catalogs and the error bars indicate the standard deviation of the mean. The solid lines correspond to estimates of p based on rfudge and the improved centers, and the dashed lines correspond to estimates based on rrms and the standard centers. The red lines correspond to galaxies associated with reconstructed groups exhibiting a 2WM to real groups. Ideally this lines should lie on the dotted line. Statistically significant deviations are caused by galaxies which are associated with more than one group. The probabilities for such galaxies are given by the green lines. In contrast, the blue lines are for galaxies associated with groups which do not exhibit a 2WM to real groups.

Standard image High-resolution image

The completeness of the group membership for all photo-z galaxies above a given threshold in p is shown in Figure 14. The blue line is for probabilities p based on rrms, the green line for p based on rfudge and the standard centers, and the red line for p based on rfudge and the improved centers. The biggest difference between the blue curve (using rrms) and the other lines is at low p, where particularly for small groups the completeness is significantly lower than for the curves being based on rfudge. For small groups, rrms can be an underestimate and so too few photo-z galaxies are associated with such groups. This is the most significant advantage of using rfudge instead of rrms. The difference between the choice of the group centers is most obvious at high p and for large groups, where the improved centers exhibit a slight improvement. The choice of the group extension is, however, more important than the choice of the centers.

Figure 14.

Figure 14. Completeness of the photo-z group population with probabilities >p. The lines show the median of the 24 mock catalogs and the error bars exhibit the upper and lower quartiles. Only mock groups with 2WM to real groups are considered. The blue line corresponds to probabilities p based on rrms, the green line to p based on ffudge and standard group centers, and the red line to p based on ffudge and on the improved centers.

Standard image High-resolution image

The fraction of photo-z with an association probability >p is shown in Figure 15 for the actual data and for the mock catalogs. To allow for a meaningful comparison between galaxies with p > 0 and galaxies with p = 0 we constrain the redshift range to 0.1 < z < 0.8, where most of the groups are. About 60% of the photo-z galaxies have zero probability to be associated with any of the spectroscopic groups, while 40% have a non-zero probability of membership of one or more groups. This fraction of possible group members drops quite fast as the p threshold is increased. The slight excess of low-probability members in the actual data is due to the larger number of small groups in the 20k sample (cf. Figure 5).

Figure 15.

Figure 15. Fraction of photo-z galaxies with an association probability >p in the redshift range 0.1 < z < 0.8. The histogram shows the actual data and the solid line the corresponding fraction within the mock catalogs (error bars are the standard deviation among the 24 mock catalogs). The fraction of galaxies which are not associated with any spectroscopic group (i.e., those with p = 0) are shown on the left. These comprise 60% of the photo-z objects. It should be noted that the p = 0 and p > 0 fractions do not sum to unity because some galaxies have multiple p values due to possible membership to different groups. The slight excess of low-probability members in the actual data is due to the larger number of small groups in the 20k sample.

Standard image High-resolution image

The completeness and interloper fraction for the flux-limited mock group population that is obtained by including in the groups all potential members with a minimal association probability p are summarized in Figure 16. We show the mean completeness Sp (blue region) and mean interloper fraction Ip (red region) of the 24 mock catalogs, where in each mock catalog all reconstructed groups were considered (left panel). We regard only those group members that are members of the corresponding real group as successes. The point p = 1 corresponds to the purely spectroscopic group membership. Note that these statistics are worse than the galaxy success rate Sgal and interloper fraction fI shown in Figure 3 because previously we were only concerned with whether the galaxy was a member of any group. Furthermore, here we refer to the entire flux-limited population and not only to the spectroscopic sample.

Figure 16.

Figure 16. Average statistics for the total flux-limited galaxy sample. The left panel considers all groups in the mock catalogs and the right panel only those groups with a 2WM to real groups. The blue region shows the galaxy success rate Sp, the red region the interloper fraction Ip, and the green region the fraction of correctly assigned most massive galaxies SpM by picking the galaxy with the highest pM, where in each case the galaxy sample includes all galaxies with an assignment probability >p. The point p = 1 corresponds to the spectroscopic sample. The solid lines indicate the statistics for groups with richness N  ⩾  10 and the dashed lines that for pairs. Note that a galaxy membership is here only regarded as a success if it is a member of the corresponding specific real group making these statistics rather restrictive. The poorer performance in the left-hand panel is due to the issues of group detection (i.e., overmerging, etc.).

Standard image High-resolution image

The interpretation of Figure 16 is as follows: looking at the claimed membership of a given reconstructed mock group, i.e., summing the spectroscopic members and all those photo-z galaxies above a minimal probability threshold p, the new galaxy success rate Sp is the number of these that are actually members of the corresponding real group divided by the total membership of this corresponding real group. This is given by the blue region of the left-hand panel, which is bounded by the lines for N = 2 and N  ⩾  10, where N refers to the observed spectroscopic richness. The fraction of claimed galaxies that are not members of this particular group, which is the interloper fraction Ip among the claimed members, is given by the corresponding red region. As an illustration, group members of an overmerged reconstructed mock group that belong to the second real group (which is not regarded as the proper real counterpart) are regarded as failures and will increase the Ip statistics (while they were not necessarily regarded as failures in the earlier fI statistics). If we, however, know (for reasons beyond our group catalog) that the group we are interested in is properly detected (i.e., has a 2WM to a real group), the statistics would improve to the regions in the right panel. Particularly for small groups (dashed lines), the difference will be significant owing to the uncertainties in the group detection.

6. APPLICATIONS USING ADDED PHOTO-z MEMBERS

In this section, we perform four straightforward applications considering the potential members on the basis of their photo-z. We look in turn at the corrected richness, the identification of the most (stellar) massive galaxy in the group, the location of the spatial center of the group, defined as the minimum of the potential well, and finally an approach to identifying the galaxy at that center, which we define to be the central galaxy, all other group members being satellites.

Motivated by the obvious variation of the galaxy success rate and interloper fraction of the spec-z group population with group-centric distance (cf. Figure 10 of K09), we introduced also an association probability p for the spectroscopic galaxies. This will prevent spec-z galaxies at the outskirts of the groups to be given a too large weight compared to their photo-z group population. We assigned the probabilities in the same way as for the photo-z except for the fact that we assign only probabilities to spectroscopic galaxies which were already group members and we set σz to zero, i.e., the association probability was determined only by the distance from the group center. For pairs, the assigned probabilities were just set to one.

6.1. Corrected Richness

A straightforward application of the association probability p is to estimate the corrected richness Ncorr of the groups above the flux limit, i.e., the total richness the groups would have if we knew all their real members down to the flux limit of the survey, by summing up all probabilities of the group members (spec-z and photo-z). Not surprisingly, the estimated corrected richness is on average unbiased with respect to the real corrected richness for all observed spectroscopic richness classes N, because this was used in establishing the probabilities. It exhibits a scatter of about 30%, weakly depending on N.

The corrected richness could also be estimated by considering the SSR and RSR (see Section 2.1) at the positions of the spec-z group members. However, the resulting corrected richnesses are biased for groups with N  ≲  4 by being about 40% too high and also the scatter is larger, being about 50%. The reason for this bias for small groups is a selection effect. Since the observed richness N is the result of a Poisson sampling process when assigning the slits to the targets, it has an intrinsic scatter for a given SSR and RSR. If, however, N drops below 2, the group cannot be observed and is lost, while for the scatter toward high N there is no such limit.

We conclude that the photo-z are useful for obtaining unbiased estimates of the corrected richness for all groups. We can, of course, also estimate the corrected richness Ncorr(MB, lim) with respect to a given absolute magnitude limit MB, lim (cf. Section 4.2 of K09).

6.2. Identifying the Most Massive Galaxy of the Group

We introduce the probability pM of a galaxy to be the most massive (in terms of stellar mass) of a given group. This is done by sorting all the members—spectroscopic as well as photometric—in descending order of mass such that Mi − 1  ⩾  Mi for i ∈ 2, ..., Ntot, where Ntot is the number of spectroscopic and photometric members. The probability [pM]i of a given galaxy is the probability that it is the most massive galaxy in the group, which will depend on both its own probability of membership, pi, and the probabilities of non-membership of higher ranked galaxies, i.e., for the first-ranked galaxy, [pM]1 = p1, and for the remainder,

Equation (12)

Figure 17 compares the pM to the empirical fraction of correctly identified most massive galaxies within the mock catalogs. Ideally, this would be the dotted diagonal line, in that galaxies with some value of pM should be the real most massive galaxies in a fraction pM of cases. The red curve uses association probabilities p based on rfudge and the blue curve those based on rrms. The black curve is based on rfudge, but does not include observational errors in stellar mass determination, which are included at the level of 0.2 dex in the red and blue curves. The conclusion is that the basic scheme works (as would be expected) but that mass estimation uncertainties will be significant. While it makes no substantial difference whether the association probabilities p for the computation of pM are based on rrms or rfudge, the uncertainty in the stellar mass of 0.2 dex causes the pM to be underestimated for large pM. Nevertheless, there is a strong correlation between pM and the fraction of cases in which the galaxy under consideration is the real most massive galaxy of the group. For a cut, for instance, of pM > 0.7, the true probability is still higher than 50%, i.e., such a galaxy has a bigger chance of being the most massive galaxy than all the other candidates in its group put together. For a proper interpretation of pM, it is, however, important to keep this effect of the mass uncertainty in mind.

Figure 17.

Figure 17. Fraction of correct most massive galaxies within reconstructed 2WM mock groups as a function of pM. Each panel is for a different richness class as indicated. The lines show the mean of the fractions computed for each of the 24 mock catalogs and the error bars indicate the error of the mean. The red curve corresponds to pM based on rfudge and the sophisticated group centers, and the blue curve corresponds to pM based on rrms and the standard centers. The black line corresponds to the former case, but does not include observational errors in stellar mass of 0.2 dex.

Standard image High-resolution image

In assessing the usefulness of this scheme, this figure should be combined with Figure 18, which shows the distribution of pM as a function of richness class, for both the mock catalogs and for the actual 20k data. Note that for each richness class, the distribution of pM of those galaxies with the highest pM in their groups is a steep function of pM. This tells us that most groups have a clear candidate for being the most massive galaxy. The actual 20k sample (red histogram) follows fairly well the histogram for the mock catalogs (black solid) except maybe for the largest pM. Despite the uncertainty in stellar mass, pM is a very useful concept and works reasonably well for the actual data.

Figure 18.

Figure 18. Distribution of the probabilities pM of those galaxies with the highest pM in their groups. These galaxies can be either spectroscopic or photometric. Each panel corresponds to a different richness class. The black histogram corresponds to the mean distribution of the 24 mock catalogs and the error bar indiactes the standard deviation. The dotted histogram considers only the mock groups having a 2WM to real groups. The red histogram shows the histogram for the actual zCOSMOS 20k groups. It is obvious that most groups have a clearly identified candidate for being the most massive galaxy.

Standard image High-resolution image

It should be noted that in Figure 17 the pM for the unperturbed stellar masses slightly underestimates the true probability for pM ≳ 0.5 as measured by the fraction of such galaxies that really are the most massive members of their groups, i.e., the black line lies slightly above the dotted diagonal line. This is due to the fact that the association probabilities p were derived irrespective of the mass or luminosity of the galaxies. If massive galaxies are more likely to be in groups, then this process will have underestimated the pi and thus pM for these more massive galaxies, producing the small offset observed in Figure 17.

The average success rate SpM of detecting the most massive galaxy within the reconstructed mock groups as a function of a probability p threshold is shown in Figure 16. For high richness groups, this success rate increases by about 10%–20% when we include photo-z galaxies, while it is relatively constant for spec-z pairs. So the inclusion of the photo-z galaxies has a rather small effect on SpM. In fact, for 87% of all groups the galaxy with the largest pM has a spectroscopic redshift (for the mock catalogs this number is 84% ± 2%). It might be thought that this ratio should be equal to the average spatial completeness of the survey, since this determined the chance that a given galaxy is observed spectroscopically. This will be the case for very rich groups, which would be recognized regardless of the statistical fluctuations of spatial sampling. For poorer groups, there is however a selection effect in that those with higher spectroscopic sampling will be more likely to be recognized as a group. Indeed, for "real" pairs, both members must have been observed spectroscopically for the group to be recognized, and the most massive galaxy will therefore always be a spectroscopic galaxy.

Does this relatively modest gain mean that it is not worth bothering with the photo-z? The answer is no. First, we show in the next section that there are significant gains in finding the spatial center of the group. Second, the inclusion of photo-z objects dramatically reduces the number of galaxies that are incorrectly identified as the most massive in the richer groups. These may be among the most interesting from a galaxy evolution point of view. It should also be noted that for these statistics the identification of the most massive galaxy is only regarded as a success if it is the most massive galaxy of the specific group we think it is a member of. Selecting a galaxy that is the real most massive galaxy of another group (even one that has been detected) is considered here as a failure. This is a rather restrictive perspective and depending on the application it may be sufficient to just know whether a certain galaxy is the most massive of any group (cf. Section 6.4).

6.3. Locating the Spatial Group Center

Another immediate application of the association probabilities p is to estimate the centers of the groups. By group center we mean the center of the corresponding DM halo which is defined by the position of the deepest point in the gravitational potential well. In the Millennium simulations, this position is given by the most bound particle within a halo and is also, by construction, the position of the "central galaxy" in the halo.

With the aid of the mock catalogs we can test several estimators E for the group center and compare their relative accuracy. Some of these estimators are based on the areas of the Voronoi cells of the projected group galaxy positions (see also Presotto et al. 2012). To compute these areas we project a group to the plane perpendicular to the line of sight and perform a two-dimensional Voronoi tessellation considering only the group galaxies (either only spec-z or both spec-z and photo-z) and the spectroscopic field galaxies surrounding the group to prevent the areas of the Voronoi cells at the outskirts of the group to become infinite. We expect the size of the Voronoi areas to be smaller on average toward the center of the groups.

In the following, using the mock catalogs we will test 10 different estimators, E1 to E10, to identify the group centers. The estimators E1 to E4 depend on the spectroscopic information only:

  • E1: mean of the positions of the spectroscopic members;
  • E2: stellar mass weighted mean of the positions of the spectroscopic members;
  • E3: inverse Voronoi area weighted mean of the positions of the spectroscopic members;
  • E4: stellar mass and inverse projected Voronoi area weighted mean of the positions of the spectroscopic members.

The estimators E5 to E8 include also the information from the photometric galaxies. They are basically identical to the former estimators, but that each galaxy—spec-z as well as photo-z—is additionally weighted by their association probability p:

  • E5: probability weighted mean of the positions of all group members;
  • E6: probability and stellar mass weighted mean of the positions of all group members;
  • E7: probability and inverse Voronoi area weighted mean of the positions of all group members;
  • E8: probability, stellar mass, and inverse Voronoi area weighted mean of the positions of all group members.

The estimators E9 and E10 are not any more based on the (weighted) mean of the positions of the group members, but attempt to find directly the central galaxies of the groups. They are defined by selecting for each group the galaxy with the largest ratio R, as follows:

  • E9: location of the galaxy with the largest R = p/A;
  • E10: location of the galaxy with the largest R = pM*/A.

Here, M* is the stellar mass and A the Voronoi area of the galaxies. Note that all estimators which are based on the Voronoi area A are not necessarily defined for all groups since A might happen to be infinite for the members of small isolated groups near to the border of the survey.

The average physical offsets between the estimated mock group centers and the real group centers are shown in Figure 19 for different richness classes and for different apparent group extensions $\tilde{r}_{\rm rms}$ (see Equation (9)).

Figure 19.

Figure 19. Projected physical offset between the estimated group centers and the true group centers in the mock catalogs. The lines show the median offsets of all reconstructed 2WM groups within the 24 mock catalogs, and the error bars indicate the upper and lower quartiles. The x-axis plots the four quartiles for the apparent group extension $\tilde{r}_{\rm rms}$ (see Table 8). The estimators are indicated for each row in the left panel and the richness class increases toward right. Blue lines contain only spec-z information, and red and green lines contain spec-z and photo-z information. For comparison, E1 is shown in all panels as dotted line.

Standard image High-resolution image

The dependence on $\tilde{r}_{\rm rms}$ is considered by dividing each group population within a richness class into the four quartiles of its distribution in $\tilde{r}_{\rm rms}$. The values of the boundaries dividing the four quartiles are given in Table 8. The goodness of the different estimators E1 to E10 strongly depends on both the extension and the richness of the group. There can be several statements made.

  • 1.  
    The smaller the group extension, that is, the smaller the group appears projected on the sky, the smaller the offset from the true group center.
  • 2.  
    The larger the observed richness of the group, the more effective is weighting the galaxies by stellar mass or the inverse of the Voronoi area.
  • 3.  
    For groups with N  ≲  5, weighting by stellar mass is more effective than weighting by the inverse of the Voronoi area and the reverse is true for N ≳ 5 groups.
  • 4.  
    Weighting by both stellar mass and the inverse of the Voronoi area is superior to weighting by either of these alone for all richness classes except for pairs.
  • 5.  
    The larger N and the more extended the group, the more effective is the consideration of the photo-z galaxies. For N  ≲  5 there is hardly any gain from using the photo-z information, whereas for N ≳ 5 there is a clear gain for all estimators particularly for extended groups.
  • 6.  
    The size of the error bars suggests that the most robust estimator for pairs is the simple geometrical mean between the two galaxies.
  • 7.  
    For all groups except of pairs, by far the best estimator is E10. For groups with N  ⩾  5 for at least half of the groups of any extension the central galaxy (not necessarily the most massive, see below) is correctly identified (i.e., offset equals zero) and for three-quarters of the groups the offset from the true group center is less than about 20 kpc. Compared to the typical extension of a group of the order of half an Mpc, this is an extremely good result.

Table 8. Borders Dividing the Four Quartiles of the Distribution of $\tilde{r}_{\rm rms}$ for Group Populations of Different Richness Classes

  1/2-quartile 2/3-quartile 3/4-quartile
N = 2 0.0011 0.0034 0.0055
3  ⩽  N  ⩽  4 0.0042 0.0064 0.0089
5  ⩽  N  ⩽  9 0.0079 0.0109 0.0150
N  ⩾  10 0.0140 0.0193 0.0273

Note. The values are given in degrees.

Download table as:  ASCIITypeset image

However, regarding E10 we have to be careful since the mock catalogs by definition have a massive galaxy at the centers of their groups. This "central galaxy paradigm" is still under investigation (see, e.g., Skibba et al. 2011). Also note that in the mock catalogs the central galaxy of a group is not always the most massive, but in about 20%–25% of all real mock groups—depending weakly on N—there is a more massive galaxy within the magnitude-limited group population.

Also note that 50%–60% of the galaxies selected by E10 are also the galaxies with the highest probability pM within the group. Although the two concepts are similar, they are not equal. In particular, pM can be assigned to each galaxy in a group and also yields a quantitative measure of fidelity rather then just selecting a particular galaxy without giving any information "how good" this selection is. Moreover pM is totally independent of the assumption where massive galaxies preferentially reside within a group.

Given the results in Figure 19 we have chosen the "improved group centers," in contrast to the "standard group centers" being E1, as follows. For groups with N = 2 we kept the centers to be E1, for 3  ⩽  N  ⩽  4 we took E3 if available (93% of cases) and otherwise E1, and for N  ⩾  5 we took E7 (always available). So only for N  ⩾  5 have we used information from photo-z and in neither case have we used information from the stellar mass. Inspection of Figure 19 shows that the improved group centers should exhibit offsets ≲ 100 kpc for basically all richness classes and group extensions.

The effect on the group center estimates E1E10, if we use association probabilities p based on the improved centers instead of the standard centers, is strongest for E5, especially for rich and extended groups. The differences in the offsets are, however, never larger than ∼30% and for E7 they are basically negligible. This shows that the iterative process for deriving the association probabilities indeed converges after one iteration.

6.4. Separating Central and Satellite Galaxies

As mentioned in the previous section, the central galaxies, which we defined to be those galaxies located at the minimum of the gravitational potential, are not necessarily the most massive galaxies within the halos. However, in terms of evolutionary processes, it is likely that it is the location in the potential well that is most relevant, and so in the following we will discuss how well we can differentiate between the central galaxies and the remaining so-called satellite galaxies. For both centrals and satellites we will differentiate between simply knowing whether a galaxy is a central or a satellite, and the more stringent case of additionally knowing which group or halo it is the central or satellite of.

Can we add spatial information to pM? Motivated by the performance of the group center estimator E10 (see Figure 19), we introduce another probability pMA which is computed similarly to pM, but instead of ranking the group galaxies by their stellar mass M* we rank them by M*/A with A being the area of the projected Voronoi cell of the galaxy and thus includes directly information on the local galaxy density. It should however be noted that pM already includes some positional information because of the radial dependence of the group membership probability p. In the following we will discuss how the fraction of centrals fc and satellites fsat varies across different galaxy samples that are selected in terms of p, pM, pMA, and N.26 The results are summarized in Figures 20 and 21 and Table 9. To allow for a sensible comparison between the number of central and satellites in the group and non-group galaxy population, we restrict the redshift range to 0.1 < z < 0.8.

Figure 20.

Figure 20. Fraction of centrals fc for different mock galaxy samples in the redshift range 0.1 < z < 0.8. Upper panels: the left panel shows the fraction of centrals for the total flux-limited sample, and the second from the left the fraction for those galaxies (spec-z and photo-z) which are not associated with any group (i.e., p = 0). The third and the fourth panel from the left show the fraction of central galaxies within our mock group population for two different richness classes, where the selection probability psel is pMA for the blue lines, pM for the red lines, and the intersection of the two for the green lines. The solid lines correspond to the fraction of galaxies being the centrals of any group, while the dashed lines correspond to the fraction of correctly identified centrals of the corresponding specific real group. Lower panels: the number of mock galaxies in the selected samples. In all panels, the error bars indicate the standard deviation among the 24 mock catalogs. For the points in the left two panels, the error bars are smaller than the size of the points. Note that only galaxies with psel > 0.1 are shown. The fractions and numbers for galaxies with 0 < psel < 0.1 deviate much from the relatively constant curves shown here (cf. Figure 21). Note that the difference between the solid and the dashed lines for groups with 2  ⩽  N  ⩽  5 comes mainly from the uncertainty in the detection of pairs, and note that the numbers of centrals in the actual data are similar to those in the mock catalogs (cf. Table 9).

Standard image High-resolution image
Figure 21.

Figure 21. Fraction of satellites fsat in the mock group population selected by pM < 0.1 and pMA < 0.1 as a function of association probability >p. The two panels correspond to the richness classes as indicated, and the galaxy samples are restricted to the redshift range 0.1 < z < 0.8. Upper panels: the solid lines refer to the fractions of selected galaxies to be satellites of any group, while the dashed lines correspond to the fractions of selected galaxies to be satellites of the corresponding specific real group. Lower panels: the number of mock galaxies in the selected samples. In any panel, the error bars show the standard deviation among the 24 mock catalogs. Note that the numbers of satellites in the actual data are similar to those in the mock catalogs (cf. Table 9).

Standard image High-resolution image

Table 9. Fractions of Centrals and Satellites in Different Galaxy Samples in the Redshift Range 0.1 < z < 0.8

Sample Numbera fc fsat
    Anyb Spec.c Anyb Spec.c
All 30231 0.72 0.28
p = 0 19161 0.83 0.17
Group galaxies selected by pM > 0.5 and pMA > 0.5
N = 2 453 0.77 0.50
3  ⩽  N  ⩽  4 183 0.80 0.65
5  ⩽  N  ⩽  9 72 0.78 0.70
N  ⩾  10 24 0.79 0.75
Group galaxies selected by p > 0.5, pM < 0.1, and pMA < 0.1
N = 2 759 0.65 0.61
3  ⩽  N  ⩽  4 794 0.74 0.70
5  ⩽  N  ⩽  9 976 0.79 0.73
N  ⩾  10 956 0.84 0.76

Notes. aNumber of galaxies in the corresponding actual data samples. For the mock catalogs see Figures 20 and 21. bStatistics for centrals and satellites irrespective of their specific group memberships. cStatistics for centrals and satellites for residing in the specific groups we think they are members of.

Download table as:  ASCIITypeset image

The fraction of galaxies fc which are the central galaxies of their DM halo is shown in Figure 20 for different samples of galaxies from the mock catalogs. It should be noted that 72% of galaxies in the overall flux-limited galaxy sample, selected irrespective of any group membership, are central galaxies (left panel). If those galaxies (with either spec-z or photo-z) which are associated with groups (i.e., which have p > 0) are excluded, then this fraction rises to 83% (second panel from the left). So, if a large sample of central galaxies is needed, irrespective of the halos in which they reside, then simply selecting the non-group galaxies will already produce a rather pure sample, albeit one biased to lower mass halos.

However, if we want a sample of centrals extending up into the range of halo masses of our groups, we can still produce fairly pure samples by making a cut in either pM or pMA, or both, as shown in the third and fourth panels from the left. Interestingly pMA actually does worse than pM, but making a cut in pM and pMA simultaneously produces a very pure sample of centrals at the cost of numbers (green curves). For instance, by making the simultaneous cut pM > 0.5 and pMA > 0.5 in groups with N  ⩾  5, we obtain a sample of about 100 centrals that are pure at the level of 80%, in the sense that 80% of the galaxies are indeed centrals. However, 10% of these are actually centrals of a different halo than that identified in the reconstructed group catalog, and so the purity defined in terms of being the central of a correctly identified group (i.e., with a 2WM) reduces to about 70% (dashed lines, see Table 9).

In Figure 20 we show the central fraction fc only for those galaxies with a selection probability psel > 0.1, where psel was either pM or pMA or the intersection of the two. For the remaining galaxies, the fraction fc is much lower than for the sample with psel > 0.1, so this sample naturally consists mostly of satellites. The fraction of satellites fsat in this sample is shown in Figure 21 as a function of additional selection by the association probability p. While the curves in Figure 20 basically do not depend on an additional selection in p, the fractions of satellites in Figure 21 are sensitive to a lower limit in p. On the other hand, the choice of psel is negligible for the fraction of satellites.

The interpretation of Figure 21 is relatively straightforward: if a galaxy has a very low psel but simultaneously a high association probability p, it should be a satellite. For a probability selection of p > 0.5 for groups with N  ⩾  5, we obtain a sample of ∼2000 galaxies of which about 80% are indeed satellites of a group (solid lines), and 75% are satellites in the specific groups that we think they are in (dashed lines; see Table 9).

We may want to simply try to classify all galaxies as either central or satellites. That is, one does not just produce subsamples of centrals or satellites with high purity, but the samples of centrals and satellites are complementary and add up to the flux-limited sample. For any such division we can compute the completeness as well as the purity for either centrals and satellites, where we are not interested in their specific group membership. Note that the purity for centrals and satellites are just given by fc and fsat, respectively, for the corresponding sample. For both samples, the purity and completeness are anti-correlated, such that a high purity implies a low completeness and vice versa. Additionally, the purity of satellites will be anti-correlated with the completeness of centrals and vice versa. This is similar to optimizing the group-finding parameters to obtain an optimal group catalog (cf. Figure 4 of K09). One has to tune the parameters p, pM, etc., to find the best compromise between the completeness and purity of either sample. A sensible compromise for producing the satellite sample is selecting galaxies by p > 0.1, pM < 0.5, and pMA < 0.5, while all non-satellite galaxies constitute centrals (see Table 10). This yields a completeness and purity of centrals of 89% and 81%, respectively, and a completeness and purity of satellites of 45% and 62%, respectively.

Table 10. Completeness and Purity for Complementary Samples of Centrals and Satellites in the Redshift Range 0.1 < z < 0.8

Samplea Centralsb Satellitesc
  Compl. Purity Compl. Purity
Spec-z and photo-z 0.89 0.81 0.45 0.62
Spec-z only 0.93 0.84 0.54 0.74

Notes. aAll galaxies are subjected to a binary central-satellite classification. bCentrals are given by all non-satellite galaxies. cSatellites are selected by p > 0.1, pM < 0.5, and pMA < 0.5.

Download table as:  ASCIITypeset image

Since the spectroscopic 20k sample basically constitutes an unbiased subsample of the total flux-limited sample, we can restrict our study of centrals and satellites to the spectroscopic galaxies (once we have used photo-z objects to help classify them). In this case, the completeness and purity for centrals are 93% and 84%, respectively, and the completeness and purity for satellites 54% and 74%, respectively. Note that the statistics of the satellites especially have improved because the group membership is much better constrained for the spec-z sample.

The conclusion of this section is that by applying a selection of galaxies in p, pM, pMA, and N we can produce samples of centrals and satellites of varying purity and size. As expected, the size of the sample decreases with increasing demands on purity. Different levels of purity can also be obtained at the cost of biases in halo mass. For instance, a very large set of highly pure centrals (83% pure) is obtained by excluding all galaxies that can possibly be associated with any detected group, but this obviously then excludes all centrals in the more massive halos we have detected. Dividing all objects in the flux-limited sample into centrals and satellites yields a set of centrals that is 81% pure and a set of satellites that is 62% pure. As with most other aspects of identifying groups at high redshift, the actual construction of samples must be carefully considered in the light of the scientific requirements.

7. DISCUSSION

In this section we summarize the main properties of our 20k group catalog and comment on the general difficulties of producing high-quality group catalogs.

The catalog that we have presented in this paper contains almost 1500 groups of which ∼570 host three or more spectroscopic galaxies. Based on detailed analyses using realistic simulated mock group catalogs, about 75% of the groups with three or more members should be real in the sense that they exhibit a one-to-one correspondence (i.e., 2WM) to real groups. The remainder are either fragmented, overmerged, or entirely spurious. The overall purities and completenesses for these groups (even relative to only "detectable groups" and if we do not care about the nature of the group, i.e., 1WM) are about 83%. For groups that host only two spectroscopic galaxies, the statistics are even slightly worse. Fortunately, for groups with more than two spec-z members, these statistics are basically independent of the observed spectroscopic richness N over a broad range of redshift and the number of groups as a function of N should be an unbiased tracer for the number of real groups.

Given the work involved, this overall result might appear disappointing. Even the relatively simple task of differentiating centrals and satellites is quite difficult, especially if one wants to classify all galaxies. In the latter case, we get at best a completeness and purity of centrals of 93% and 84%, respectively, and of satellites 54% and 74%, respectively. Many problems have their origin in issues concerning the basic group catalog (e.g., overmerging, fragmentation). However, these statistics are very good compared with other group catalogs at high redshift in the literature (e.g., Gerke et al. 2005, 2012; Cucciati et al. 2010). So, it is presumably just an unpleasant fact of life that the construction of high-quality group catalogs (at least when using only spatial galaxy information) is very difficult and subjected to several limitations. The reason for this is that groups, in contrast to huge clusters and single galaxies, exhibit by nature a rather low-density contrast against the general field which makes them difficult to detect and suffer from problems that can hardly be cured (e.g., overlapping groups in redshift space, interlopers in redshift space; see K09 for a discussion of difficulties in detecting groups).

Can we do better? We discussed in this paper the exploitation of high-quality photo-z to compensate for the incompleteness of spec-z galaxies in our sample. It is very unlikely that one could detect new groups using these photo-z that were not detected before with the spec-z, since even these high-quality photo-z have an uncertainty of δz ∼ 0.01(1 + z) which amounts to several times the extension of a group along the line of sight. However, the photo-z are quite useful in characterizing groups that are already detected. Big groups especially benefit much from the photo-z information, insofar as they improve the estimation of the group centers significantly (≲ 100 kpc offsets) and prevent mistakes in assigning most massive galaxies to groups. The inclusion of photo-z also allows unbiased estimates of the corrected richness Ncorr to an accuracy of ∼30%.

Would a 100% sampled spectroscopic survey with the same flux limit produce a "better" group catalog? While a higher sampling rate will find more groups of lower average mass, the figures of merit such as g1, g2, etc. (see Section 3.1), will not improve substantially. These are defined relative to the groups that should have been detectable in the survey. This is seen in the small statistical differences between the 10k and the 20k group catalog (see Figure 3) and also in the differences between the full and the central region of the 20k sample (see Table 3). We also performed tests with complete flux-limited mock samples, which also suggest that the gain in these statistics would only improve a couple of percent. We would find more groups for a given observed richness N, but at any richness N the basic problems in detecting groups (e.g., the overlapping of groups in redshift space, low-density groups, interlopers in redshift space, etc.) would remain. This expectation is also shown by a comparison with the FOF group catalog from the highly complete GAMA survey, whose statistics were also obtained by comparison with the Millennium simulation (Robotham et al. 2011) and are broadly comparable, as far as we can determine from their paper. For example, their reported fraction of 77% for 2WM reconstructed mock groups is not substantially better than our value of ∼75% in Figure 6. The construction of high-quality group catalogs is presumably subject to limitations that are intrinsic to the nature of groups and not so sensitive to the details of the spectroscopic survey.

8. CONCLUSION

In the first part of this paper, we have presented the construction and properties of the zCOSMOS 20k group catalog. The basic catalog was derived by applying an FOF multi-run algorithm, whose parameters were tuned by realistic simulated mock galaxy catalogs, on the ∼16, 500 high-quality spectroscopic redshifts of the 20k zCOSMOS sample.

The catalog contains 1498 groups in total and 192 groups with more than five spectroscopic members. If pairs are excluded, its one-way completeness is as high as 83%, and its one-way purity is 82% compared to all groups principally observable within the 20k sample. About 75% of these groups exhibit a 2WM (i.e., a one-to-one correspondence) to real groups. These statistics are robust over essentially the whole range of richness, above three or more members, and across the whole redshift range. The fraction of spectroscopic galaxies that can be associated with a group decreases from about 35% to 10% over the redshift range from z ∼ 0 to z ∼ 1. A prominent feature of the catalog is that the number of reconstructed groups traces very accurately the number of real groups for all richnesses.

Comparisons of the 20k group catalog with the 24 mock catalogs obtained from the DM Millennium simulation exhibit some similarities, but also some differences. The number of groups in the 20k catalog are well within the error bars of the number of reconstructed mock groups over a broad range of observed richnesses. However, there are too many small groups with N = 2–3 and too few large groups with N ≳ 20 in the zCOSMOS group catalog compared to the mock catalogs. This could be an indication that the σ8 of the Millennium simulation is in fact too large compared to the actual universe in agreement with the latest cosmological measurements. The fraction of galaxies in groups for the total catalog shows fair agreement with the mock catalogs except for z ≳ 0.7 where the fraction is significantly higher for the actual data. On the other hand, particularly at high redshift and for volume-limited samples, there are apparently more galaxies in groups than expected from mock catalogs.

We do detect clear evidence for the growth of cosmic structure over the last seven billion years because the fraction of galaxies that are found in groups (in volume-limited samples) decreases significantly to higher redshifts.

In the second part of this paper, we have developed a scheme for complementing the group population by those galaxies which have no reliable spectroscopic redshift, but only a photometric one. This was achieved by assigning to all photo-z galaxies a mock-calibrated association probability p for being a member of a given group. With the aid of the mock catalogs we studied the fidelity, distribution, and completeness of photo-z galaxies associated with groups and found that the concept works comparably well for the actual data.

Using the flux-limited group population and the membership probabilities, we introduced a probability pM for each galaxy to be the most (stellar) massive of a group. We found that, for the actual data as well as for the mock data, most of the groups of any richness have a clear well-defined candidate for their most massive galaxy. The fidelity of pM, however, depends sensitively on the measurement errors of the stellar mass. Despite this problem, selecting galaxies with pM  ⩾  0.7 yields a success rate of finding the real most massive galaxies in more than 50% of cases.

As another application of the membership probability, using the mock catalogs, we studied ten estimators for locating the spatial centers of the groups, of which four are based only on spec-z information and six on a combination of spec-z and photo-z. We found that all estimators typically depend on both spectroscopic group richness N and projected apparent extension of the group. Typically, the higher N and the more extended a group, the more effective is the consideration of the photo-z information. Weighting the galaxy position by the inverse of their projected Voronoi area is also more effective in high-N groups. We found that the combination of weighting galaxies by their inverse Voronoi areas and by their stellar mass is superior than just using one of these weighting schemes alone. We define "improved centers" by a combination of these estimators (without using information from stellar mass) which should yield offsets ≲ 100 kpc from the deepest point of the potential well for most groups of any richness class and group extension. According to the mock catalogs, by considering stellar mass even smaller offsets are achievable.

The best of the 10 estimators achieves the successful selection of the galaxy at the potential minimum (not to be confused with the most massive galaxy) for at least half of all mock groups with N  ⩾  5, and for 75% of all groups it yields offsets of less than 20 kpc from the real group center.

Finally, we investigated the question of how well we can define galaxy samples of central and satellite galaxies, where the centrals are defined to be the galaxies lying at the minimum of the gravitational potential. In addition to pM, we introduced another probability, pMA, which also includes beside the stellar mass information from the local density at the position of the galaxy. While for picking the central galaxy of a group, pM and pMA work comparably well, they are most powerful when taken in combination. We found that by applying suitable cuts in p, pM, and pMA, we are able to construct fairly pure samples of either centrals or satellites (typically about 60%–80% purity depending on the richness of the group).

If we want to classify all galaxies in a binary way as either centrals or satellites, we can compute the completeness as well as purity for either centrals and satellites. We defined a division such that for the total flux-limited sample the completeness and purity of centrals are 89% and 81%, respectively, and of satellites 45% and 62%, respectively. By constraining this division to the spectroscopic sample, the completeness and purity of the centrals are 93% and 84%, respectively, and of satellites 54% and 74%, respectively.

This research was supported by the Swiss National Science Foundation, and it is based on observations undertaken at the European Southern Observatory (ESO) Very Large Telescope (VLT) under the Large Program 175.A-0839.

Footnotes

  • European Southern Observatory (ESO), Large Program 175.A-0839.

  • 23 

    Since the groupfinder is calibrated using the mock catalogs, the definition of a DM halo used in this paper corresponds to the operational definition of a DM halo in the Millennium simulation. That is, a DM halo is a friends-of-friends group of DM particles with a linking length of b = 0.2. These groups ideally correspond to structures with a mean overdensity of roughly 200.

  • 24 

    It is unlikely that such a single optimal figure of merit exists. For instance, the optimal catalog with respect to $\tilde{g}_1$ over the whole range of group sizes is not necessarily also the optimal catalog with respect to the produced number of reconstructed groups Nrec, since we found that almost equally good catalogs with respect to $\tilde{g}_1$ can exhibit substantial differences in Nrec.

  • 25 

    As an aside, the GRPi catalogs are similar but not identical to what we called the iWM, i = {1, 2} subcatalogs in K09. The iWM catalogs contained not only a subsample of groups of the basic FOF catalog, but also a subsample of the members of each group so that the richness of a group of the iWM catalog was in general not the same like that of the corresponding FOF group. For the groups of the GRPi catalogs, the richness is always the same. In this paper, we will never use the term iWM in the meaning of subcatalogs as in K09, but only to indicate the relation between reconstructed and real groups.

  • 26 

    Owing to the multiple group associations of some photo-z galaxies, a selection by p, pM, and pMA does not, in general, lead to a sample of galaxies with unique group membership. If needed, we resolve this degeneracy by taking for each galaxy that has multiple associations to groups the association with the highest p.

Please wait… references are loading.
10.1088/0004-637X/753/2/121