TERRESTRIAL, HABITABLE-ZONE EXOPLANET FREQUENCY FROM KEPLER

Wesley A. Traub

doi:10.1088/0004-637X/745/1/20

1. INTRODUCTION

The frequencies of exoplanet types, categorized by radius, period, and host-star spectral type, offer clues to the origin and evolution of exoplanet systems. Data from the initial 136 days of the Kepler mission are particularly valuable because they form a large and relatively complete sample, even at this early phase of operation. This paper examines the Kepler database to estimate the frequencies of each planet category and extrapolates the data for an estimate of η_⊕, the frequency of terrestrial planets in HZs around their host stars.

Since a major goal of the Kepler mission is to estimate η_⊕, it is worthwhile early in the mission to analyze the data for this quantity. In addition, a careful study of the data gives hints about how the mission is performing. All this information will be useful in guiding future decisions on data analysis methods and mission operations.

This paper discusses the sample database (Section 2), bias estimation (Section 3), the radius and period distributions in the sample (Sections 4–6), the radius and period distributions in the population (Sections 7–9), and the HZ and estimated η_⊕ (Sections 10 and 11).

2. THE SAMPLE DATABASE

In Borucki et al. (2011b), hereafter "B2011," the database lists planetary candidates discovered during the first 136 days of observation by the Kepler mission. In particular, Table 1 in B2011 lists the host-star characteristics and Table 2 lists the planetary candidates with their characteristics. Hereafter the terms "planets" or "exoplanets" will be a shorthand for the more conservative term "planetary candidates" used by the Kepler team.

Table 1. Radius versus K_p in Sample

K_p Range	r =	r =	r =	r =	r =	r =	N_s	n(mid-r)/N_s
(mag)	all-r	0.6–1.4	1.5–2.0	2.1–3.0	3.1–8.0	8.1–39.7
10.0, 12.999	136	25	32	33	38	8	21,822	0.47 ± 0.05%
13.0, 13.999	213	33	47	64	47	22	34,903	0.45 ± 0.04%
14.0, 14.999	315	25	65	110	78	37	39,285	0.64 ± 0.04%
15.0, 15.999	288	13	57	89	78	51	55,952	0.40 ± 0.03%

Notes. For each magnitude interval (Column 1), the table lists the number of Kepler transits around FGK stars in the sample, for all planet radii (Column 2), and for five sub-ranges of radii (Column 3–7). The number of basis stars N_s is in Column 8, and the fraction of stars with mid-size planets detected, in each magnitude bin, is in Column 9. All periods P < 42 days are included.

Download table as: ASCII Typeset image

Table 2. Radius Distribution in Sample

log (r)	n_p	n_p	n_p
(Range)	(All K_p)	(Bright)	(Faint)
−0.30, −0.15	8	6	2
−0.15, 0.00	13	9	4
0.00, 0.15	76	44	32
0.15, 0.30	161	59	102
0.30, 0.45	302	107	195
0.45, 0.60	146	50	96
0.60, 0.75	74	32	42
0.75, 0.90	57	18	39
0.90, 1.05	53	15	38
1.05, 1.20	47	9	38
1.20, 1.65	22	6	16
Totals	959	355	604

Notes. For each radius interval (Column 1), the table lists the number of Kepler planets around FGK stars in the sample, for all Kepler magnitudes (Column 2), and for each of the bright (Column 3) and faint (Column 4) ranges. The last bin in r is wider than the others. All periods P < 42 days are included.

Download table as: ASCII Typeset image

The combined database has 1235 planets. For this paper, the following planets are removed: 16 labeled as single transits, 20 with host stars hotter than 6500 K, and 240 with hosts cooler than 5000 K. For present purposes the stars are defined to have these temperature ranges "K" (5000–5499 K), "G" (5500–5999 K), and "F" (6000–6499 K). In each sub-sample the number of planets is F (159), G (475), and K (325), with a total FGK (959). Using Table B1 from Grey (2008), these ranges correspond to the standard spectral types as follows: "F" ≈ F5−F9, "G" ≈ G0−G7, and "K" ≈ G8−K2. Hereafter, the quotation marks are dropped.

The number of target stars is estimated (http://archive.stsci.edu) using the following search qualifiers and values: cadence (long cadence, 29.4 minutes), star radius (<10 solar radius), and quarter (second). The resulting number of target stars is 153,196, which agrees exactly with the number stated in B2011. The FGK subset has 113,644 stars, about 74% of the original sample. The breakdown by spectral class is F (20,406), G (55,595), and K (37,643). For perspective, the overall frequency of detection of all planets around FGK stars is then 959/153,196 ≃ 0.63%.

The star masses and surface gravities in the sample have ranges significantly larger than the narrow limits of textbook main-sequence dwarf stars, but are close enough to luminosity class V to be labeled as such. Overall the 959-planet sample seems to be a good approximation of the target class often called "Sun-like FGK stars."

Hereafter r refers to planet radius in units of r_⊕ = 6378 km and listed in B2011 to the nearest 0.1. The planet orbital period is P, in days, listed in B2011 to many significant figures.

Following standard statistical practice, the numbers of planets in the observed sample are denoted by lowercase n, and the estimated numbers of planets in the parent population are denoted by uppercase N. Logarithms denoted by ln are base e.

"Terrestrial, habitable-zone" planets are defined here in terms of radius and surface temperature. Terrestrial planets are taken to be those with 0.5 ⩽ r ⩽ 2.0, corresponding to roughly 0.1–10 Earth masses (Lunine et al. 2008). For reference, the average radius of Uranus and Neptune is 3.9, Saturn is 9.4, and Jupiter is 11.18 (equatorial radius). For convenience the dividing line between ice and gas giants is taken to be r ⩽ 8. Thus three radius ranges are defined: small (terrestrial), medium (ice giant), and large (gas giant).

The HZ is the region around a star where liquid water could exist on the surface of a planet. This paper adopts three ranges of star–planet separation that have been proposed as HZ limits, as discussed in Section 10.

References to bright or faint stars in this paper are a shorthand for apparent brightness or faintness, not absolute.

3. BIAS ESTIMATION

It is important to understand the biases that may exist in the database. In this context, a bias is defined as a difference in character between the observed sample and the actual parent population. As noted in Section 2, the sample here is the set of transits and candidate planets in the Kepler database. The parent population is the set of actual planets in orbits around Kepler's target stars.

In order to draw valid statistical conclusions, one must either explicitly compensate for a known bias or make an assumption about the importance of a potential bias. The bias analysis in this paper is based solely on the explanations in B2011 and on the initial assumption that all the data in the database are valid. No corrections are made for either overcounting (e.g., false positives) or undercounting (e.g., missing events owing to poor detection in low signal-to-noise cases). Also, no attempt is made to go outside the database itself, by using a priori estimates of signal and noise, for example, in order to estimate the completeness of the period or radius data.

To be explicit, here is a list of some known or potential biases, and the corresponding assumptions to ignore them, or actions to mitigate them, as taken in this paper. See the Kepler Data Release Notes (Christiansen et al. 2011) and Kepler Input Catalog (Brown et al. 2011) for explanatory details and extensive discussion of these and related points.

Field-of-view bias. Are the Kepler stars and planets representative of the solar neighborhood, where we expect to find and characterize planets someday, or is there a bias owing to the different galactic location? The Kepler field of view (FOV) subtends a very small part (0.3%) of the sky, with a median target-star distance on the order of a kpc. Therefore the Kepler target stars are certainly not in the immediate neighborhood of the Sun, so they may not be representative of the solar neighborhood. However they are approximately at the Sun's distance from the galactic center. Thus in terms of whether the Kepler population is representative of the solar neighborhood, there may be a bias, but it is usually assumed to be zero.

Magnitude-limit bias. The Kepler sample is magnitude limited, not volume limited, so high-luminosity stars in the sample will tend to be farther away than low-luminosity ones. However under the assumption that distant stars have the same statistical properties as nearby ones, per the FOV bias discussion above, and under the assumption that the Kepler stars are all at distances well within the galaxy, this too is assumed here to produce a zero bias.

Active-star bias. Active stars have random brightness fluctuations on timescales that include transit times, adding noise to the photometric signal, reducing the likelihood of detecting a planet, especially a small one. The Kepler team finds that giant stars have significantly more noise than dwarf stars (Christiansen et al. 2011) and that even dwarf stars have about 30% greater photometric noise than expected on the basis of previous observations of the Sun (E. W. Dunham et al. 2011, private communication; Gilliland et al. 2011). The result is that small planets are less likely to be found around active stars. The effect of this noise is included in the transit detection algorithm because there is a signal-to-noise threshold requirement, but the bias against active stars is not compensated. An a posteriori correction for this effect could be attempted after the Kepler mission is completed and more statistical information is in hand. For the present paper, a zero bias is assumed.

Star-spot bias. Noise from star spots is similar to active-star photometric noise, but at lower temporal frequency, so the active-star discussion applies here as well. A zero bias is assumed in this paper.

Stellar-parameter bias. The estimated planet radius depends directly on the assumed stellar radius, so any bias in the latter propagates to the former. In the sense that a class of stars might tend to have a bias in stellar radius, the planets around those stars will be similarly biased. Likewise the estimated planet semi-major axis depends directly on the assumed stellar mass, so similar considerations apply. Further, the assumed stellar luminosity affects the derived location of the HZ. Limb darkening also will affect the derived planet radius, especially for near-grazing transits. In this paper, a zero bias is assumed for all stellar parameters.

Spectral-class bias. A combination of other biases in this list against individual properties which define the spectral class of a star could result in an erroneous conclusion regarding, for example, the prevalence of planets around a given spectral type. In the absence of evidence to the contrary, this paper assumes a zero bias.

Impact-parameter bias. The probability of a transit depends on the assumed stellar radius and on the assumption that all transits across the disk are equally detectable. However, a grazing transit will generate a smaller photometric signature than an equatorial transit, so the effective stellar radius for calculating the probability of a transit will be less than the actual radius, and this ratio will depend on the signal and noise in a way that could be estimated. For the present paper, the maximum impact parameter is taken to be the stellar radius, so a zero bias is assumed.

False-positive bias. The elimination of false-positive detections from background eclipsing binary stars is a major consideration in the Kepler data pipeline. This bias may tend to increase for fainter stars, owing to the low signal to noise in these cases. It is possible that this type of bias exists in the present database, where an excess of giant planets is suspected around the fainter stars, as discussed in Section 4.1. The present paper avoids this bias by considering only the brighter stars and looking mainly for small-radius planets.

Planet-radius bias. The existence of a finite noise level directly affects the detection threshold and therefore limits detection of the smallest planets. It is likely that this effect exists in the present database, where a paucity of small planets is suspected around faint stars as compared to brighter ones (Section 4.1). This is a type of bias that could be modeled in the future, when more is known about the actual noise in the data and the detection algorithm. The present paper avoids this bias by basing its final results on the bright stars alone. However there still could remain a bias against small planets around those stars as well, so in that sense the actual number of planets in the population might be larger than estimated here.

Period-completeness bias. The Kepler team uses the rule that a minimum of three transits is required for a candidate planet detection. If only one or two are seen, there is great uncertainty about the planet. Likewise, if the detection algorithm does not perfectly adapt itself to the separate quarters of observations, between which the Kepler spacecraft rolls a quarter turn and after which the stars fall on different detectors, then the likelihood of finding three consecutive transits is reduced. This is the situation for the present database, although it probably will be remedied in future releases. The present database does contain transits with periods greater than one-third of the relevant mission length. However these transits were discovered in an ad hoc manner, so there is no guarantee of completeness for these longer periods (B2011). For this paper, only periods of less than 42 days are considered, although for completeness all periods are shown in some plots. The restriction to periods less than 42 days is an important feature of this paper.

Distribution-function bias. It is sometimes assumed that the frequency distribution of planets in the population can be modeled in terms of separable functions of spectral type, period, and radius (or mass). This is a mathematical convenience that is allowable only because currently there is no strong theoretical or observational evidence to the contrary. However as more data are accumulated from Kepler, radial velocity, and exoplanet microlensing observations, this convention will be tested and possibly replaced. Nevertheless at present the bias introduced by this assumption is unavoidable and it is so noted in this paper.

Mission-length bias. The three-transit rule means that the Kepler mission length must be at least three times the length of the period of planets in the outer parts of the HZs, in order to fully sample those zones. A shorter mission means that short-period data will need to be extrapolated to longer periods in order to estimate η_⊕, for example. Although this paper does carry out such an extrapolation (Section 11), there is uncertainty in doing so. The bias incurred by extrapolation is entirely unknown, so in the present paper we merely note this uncertainty but do not attempt to make any corrections.

4. RADIUS DISTRIBUTION IN THE SAMPLE

4.1. Radius Bias: Magnitude Dependence

As mentioned in B2011, there is a possible bias in the database owing to the fact that the signal-to-noise ratio (S/N) decreases as the Kepler magnitude K_p increases. To search for a sign of this bias, the 959-planet sample is subdivided into four bins of target-star magnitude, with 136 stars in the K_p range (10, 12.999), 213 stars in the range (13.0, 13.999), 315 in the range (14.0, 14.999), and 288 in the range (15.0, 15.999). The 6 stars brighter than 10.0 and the 1 star fainter than 16 are ignored. In each magnitude range, planets in five radius ranges are counted. The bins in magnitude and radius are chosen to give roughly similar numbers in each category, to aid statistical comparison. The data are listed in Table 1, and the fraction $n(\Delta r)/n(\hbox{all-\textit {r}})$ in each radius group Δr is plotted in Figure 1.

**Figure 1.** Fractions of stars in the sample with magnitude range from 10 to 16, and in the five radius groups (see inset values) from Table 1, are plotted along with Poisson error bars. Each dotted line is an average from the bright (*K_p* < 14) stars in the radius group. If there is a bias in the faint-star regime, it would be revealed by a disagreement between this line and the faint-star points. Thus, around faint stars, there appears to be a paucity of small planets and an excess of large ones.
Download figure:
Standard image High-resolution image

In the plots for the middle three groups (radii 1.5–2.0, 2.1–3.0, and 3.1–8.0), the fraction of planets in the sample is approximately constant in going from Kepler magnitude 10 to 16, as judged by the overlap or near-overlap of the error bars in each sub-group. However for the smallest and largest planets the case is different.

For the smallest planets (radii 0.6–1.4), there is a highly significant drop in planets detected around the faintest stars (K_p = 14–16) compared to the numbers found around brighter stars. This is a clear sign of the radius bias mentioned in B2011. Quantitatively, from this figure it appears that the break point is close to K_p = 14. For convenience in this paper, "bright" Kepler stars are defined as those with K_p < 14.0, and "faint" stars are defined as those with K_p ⩾ 14.0. The bright-star sample may still be incomplete in terms of the smallest planets, but in this paper the sample is assumed to be complete. The faint sample is not complete and therefore will be ignored for the purpose of estimating numbers of planets in the population (Sections 7–11).

In each panel of Figure 1, the average number of stars in the bright group is indicated by a horizontal dotted line. If there is no bias, then the faint groups should lie within about 1σ of the dotted line. The middle three radius groups are seen to be consistent with this average (see also Section 4.3), but the smallest radius planets around faint stars are seen to be 5σ–10σ below that line, and are therefore highly significant.

For the largest planets (radii 8.1–39.7), there are significantly more planets detected around faint stars than bright ones. There is no obvious astrophysical reason for this effect, although one explanation might be that false positives are being picked up from background eclipsing binaries. It could be more difficult to differentiate eclipsing binaries from planetary transits when the target star is relatively faint. If this apparent excess is indeed the case, then the detection of about 41 events out of 959 suggests that the rate of unrecognized false positives is around 4%; this is much lower than the value of about 50% mentioned in the original data release (Borucki et al. 2011a), but more in line with what B2011 says is the "substantially smaller" rate expected in the current database.

4.2. Radius Bias: Mid-size Planets

This section extends the analysis in Section 4.1 of the absolute numbers of mid-size planet transits as a function of star magnitude to ask if the relative numbers in the sample have any dependence on host-star brightness. Column 8 in Table 1 shows the basis number of stars N_⋆ observed by Kepler in each magnitude range from 10 to 16. Periods are limited to 42 days. The mid-size planets, those with radii in the range 1.5–8.0, should be free of the apparent bias at the small and large ends of the radius scale. The ratio of the number of these well-measured planets to the basis number of stars is listed in the last column of Table 1, along with the Poisson uncertainty.

Three of the four magnitude groups have excellent agreement on the number of detected planets per star, consistent with an average of (0.44 ± 0.04)%, within the uncertainties. In these three groups there does not appear to be any trend, and certainly not a significant trend toward fewer detections at faint magnitudes, as one might expect. However in the remaining group, for magnitude-14 stars, the number of planets jumps up to (0.64 ± 0.04)%, well above the average of the other groups. Averaging the two faint bins together gives a ratio of (0.50 ± 0.02)%, which is just within 1σ of the bright group average of (0.46 ± 0.03)%, so it appears that there is no evidence for a bias against detection of the mid-range of planet radii, 1.5–8.0 Earths, when comparing bright and faint target stars. The overall frequency of mid-size planet detection is (0.49 ± 0.02)% for all FGK bright and dark targets combined. For comparison, this is smaller than the frequency of detection of all planet sizes in the sample, 0.63%, from Section 2, the difference being that the smallest and largest planets are not included.

4.3. Radius Bias: Bright versus Faint Stars

Since the analysis in Section 4.1 showed that there is a fairly well-defined transition at K_p ≃ 14.0 beyond which small planets appear to be incompletely sampled, it is worthwhile to look at the overall radius distribution in the sample and to see how it depends on the bright and faint regimes.

To do this, the 959 planets in the FGK database, with periods less than 42 days, are binned into bins of equal size in log (r) space, in steps of Δlog (r) = 0.15, anchored at r = 1, and listed in Column 2 of Table 2. The breakdown into 355 bright and 604 faint entries is shown in Columns 3 and 4. The radius data are visualized in Figure 2, where the upper panel shows the total number (bright plus faint) of planets in each radius bin.

To see if there is any bias in the sample, in going from bright to faint targets, the lower panel in Figure 2 shows the ratio faint/bright in each radius bin, normalized to the total number in each range, along with Poisson error bars. For reference, note that across the mid-radius ice-giant group, and for one bin on either side, the ratio is essentially flat within the noise; this shows that Kepler is detecting planets around bright and faint stars equally well, across this range of planet radii. This is in agreement with Section 4.2. However, there are exceptions at the small- and large-planet ends of the distribution, as discussed next.

As was seen in Section 4.1, many more small planets (r < 10^0.15 = 1.41) are detected around (apparently) bright stars than faint ones. There is no astrophysical reason for this to happen, unless somehow there is a spectral-type bias in the detections, which cannot be discerned from the current data alone. The most likely reason for this difference is that small planets around faint stars are being missed by the data analysis algorithm. In this range, a total of 38 small planets around faint stars are detected, whereas about 100 ± 13 should have been seen, based on the bright-star numbers. This suggests that the detection efficiency for small planets around faint stars is only about (38 ± 6)% of the efficiency around bright stars.

To conclude this section, the data show that the database is biased against small planets (r < 1.4) around faint stars (K_p ⩾ 14.0), so for the remainder of this paper the sample basis will be the bright star (K_p < 14.0) subset. There are 35,896 FGK target stars in this bright sample, subdivided by spectral type as F (11,819), G (14,997), and K (9080); these are the basis numbers of target stars in the bright-star population.

5. PERIOD DISTRIBUTION IN THE SAMPLE

The numbers of planets in each interval of log (P), where P is the planet orbital period in days, are listed in Table 3, in bins of size Δlog (P) = 0.25. As in the radius discussion, the numbers for all Kepler magnitudes are listed along with the breakdown into bright and faint stars. The total number is 958, one less than for the radius listing, because one very long period planet is dropped. The same data are plotted in the upper part of Figure 3, where three period regimes are indicated.

**Figure 3.** Upper: the numbers of planets detected in each period bin in the sample are shown, with Poisson uncertainties. For reference, the nominal period ranges are indicated: for P < 3 days, the sample is complete, so the apparent drop-off is astrophysical in origin; for 3 days < P < 42 days, the sample is also complete; for P > 42 days, the sample is not complete, and may be biased, so the drop-off is likely an artifact. Lower: the ratios of numbers for faint/bright host stars are shown, normalized to an average of unity. Within the completely sampled range (P < 42 days), there does not appear to be any bias from faint targets compared to bright ones. However the apparently systematic trend toward a relatively smaller number of long-period planets around faint targets, compared to bright ones, is a possible bias at the 1σ level.
Download figure:
Standard image High-resolution image

Table 3. Period Distribution in Sample

log (P)	n_p	n_p	n_p
(Range)	(All K_p)	(Bright)	(Faint)
−0.50, 0.00	11	5	6
0.00, 0.25	23	5	18
0.25, 0.50	61	16	45
0.50, 0.75	161	58	103
0.75, 1.00	192	64	128
1.00, 1.25	182	71	111
1.25, 1.50	154	62	92
1.50, 1.75	100	40	60
1.75, 2.00	32	15	17
2.00, 2.25	29	13	16
2.25, 2.75	13	6	7
Totals	958	355	603

Notes. Table lists the number of Kepler planets around FGK stars in the sample, in each period range (Column 1), for all Kepler magnitudes (Column 2), for each of the bright (Column 3) and faint (Column 4) ranges, and for periods P < 42 days. The first and last bins are wider than the others.

Download table as: ASCII Typeset image

For short periods, P < 3 days, there is a sharp drop-off, which almost certainly is an astrophysical effect, since these planets would have had many transits in the database and would be relatively easy to detect. There is a mild potential bias against short-period detections in the sense that individual transits get shorter as the period decreases, however this is compensated by the fact that there are more of them to count; the net effect varies slowly with period (cf. Section 6), certainly much slower than the abrupt drop-off seen in Figure 3.

For periods in the range from 3 to 42 days, the current database is expected to be statistically complete, since at least three transits (a Kepler requirement) should have been detected in the database's 136 day window.

For longer periods the efficiency of detection in the current database is expected to drop, because B2011 note that periods greater than 42 days were not searched for in a systematic fashion. Therefore the fall-off for long periods should be no surprise, since there is certainly a selection effect here, with no implied astrophysical meaning.

The normalized ratio of detections in the sample, n(faint)/n(bright), is plotted in the lower part of Figure 3, similar to the plot for the radius distribution. For short periods, the numbers are small, so the error bar is large, and there is no obvious interpretation. In the range where the data are complete and abundant, 3–42 days, the faint and bright data sets are identical in relative numbers of detections, within the counting statistics. Indeed, there is no obvious reason why there should be any kind of instrumental or astrophysical bias here. The slow downward drift of the ratio, as the period increases, is slightly puzzling; this may indicate a difficulty in detecting long-period transits in the fainter stars, which would not be a surprising instrumental bias, but the significance is low, and more data will be needed to see how real this is.

6. PERIOD–RADIUS SCATTER DIAGRAM

The possibility of a correlation between period and radius can be investigated by plotting the 355 Kepler planets around bright stars in a (period, radius) scatter diagram, as shown in Figure 4. To guide the eye, a vertical line at P = 3 days isolates the short-period range where planets are apparently not frequent (Section 5). Another vertical line at P = 42 days indicates the cutoff point, above which the database is not complete, and which is now ignored. A pair of horizontal lines at r = 2 and 8 Earth radii arbitrarily divides the diagram vertically into small-, medium-, and large-radius planets.

**Figure 4.** Period and radius of *Kepler* planets in the sample, around bright stars, are plotted. The lower right corner is relatively empty, probably owing to low S/N there, not because small planets are absent from long periods. The upper left corner is relatively sparse, in spite of an expected high S/N there, implying a deficit of large planets on short-period orbits. The left side of the diagram is relatively empty owing to an apparent paucity of planets of all sizes at periods less than 3 days. The right side of the diagram is not completely sampled in the current database, so should be ignored here.
Download figure:
Standard image High-resolution image

The diagonal lines are a crude guide to the regions of relatively easy versus relatively hard planet detection as a function of (P, r). The simple assumption here is that the number of planets will depend on the S/N of the transits. The signal for a single transit is proportional to the transit time and the ratio of planet to star area, hence P^1/3r². The noise is proportional to the square root of the transit time, hence P^−1/6. The S/N for multiple transits is proportional to the square root of the number of transits or P^−1/2. The net S/N for a given mission length is S/N ∼ S/N₀ · P^−1/3r², where S/N₀ is a factor that depends on the star flux, etc., but not P or r. Thus lines of constant S/N can be drawn as $r = \sqrt{\rm{(S/N)/(S/N)}_0} \cdot P^{1/6}$ . These lines are drawn for the cases of 1, 10, and 100 times S/N₀, where S/N₀ is arbitrary. Several features of the scatter diagram are immediately explained by this simple S/N argument, as follows.

First, the area below the S/N₀ line, to the lower right, appears to be relatively empty of planets, and this is to be expected; it should not be concluded that there are fewer planets with small radius at large period, for example, because this region is simply the one where detection is the most difficult.

Second, the area toward the upper left is one where detection should be very easy, with many transits of large-radius planets, however the region is relatively empty. This indicates that there truly are very few planets in this region, i.e., large planets on short periods are rare. This is the opposite of the early indications from radial velocity where there appeared to be a pileup of large planets on short-period orbits, which was seen even then as a possible bias of that technique, and is shown clearly here.

Third, within the central vertical strip, between the 3 and 42 day lines, the relative density of points appears to be approximately uniform in log (P), however to the right of the 42 day line the density of points drops off rapidly. This simply illustrates that the sampling is not necessarily complete for this long-period region; there is not necessarily any astrophysical meaning to this drop-off. This region will be better sampled as the Kepler mission progresses in time, and more completely sampled period data are released.

Fourth, the trend of data points in the center 3–42 day region appears to be slightly upward in slope, roughly parallel to the S/N lines. In fact, a fit of the median radius in six equally spaced bins of log P (not shown) reveals that the apparent median radius varies as P^γ where γ ≃ 0.11 ± 0.05, which is consistent at about the 1σ level with a slope of 1/6 ≃ 0.17. The similarity of slopes suggests that the trend is purely an artifact of the detection method and not likely to be of astrophysical relevance.

7. PERIOD DISTRIBUTION IN THE POPULATION

For every transiting planet, there are many more non-transiting planets. It is well known that the probability of a transit is simply p_t = R(star)/a(orbit). Before the Kepler mission was launched, a massive effort was invested in characterizing the target stars (Brown et al. 2011), one benefit of which is that the Kepler database now contains a priori estimates of the host star mass and radius and of course semi-major axis a(orbit) from the period and star mass.

In a statistical sense, for every transit there are a total of 1/p_t planets in transiting plus non-transiting orbits. Thus it is easy to estimate N_p, the total number of planets in the population orbiting the observed stars, simply by counting the observed planets n_p with a weight factor of 1/p_t, giving

$\begin{equation} N_p = \Sigma _{i=1}^{n({\rm obs})} (p_t(i))^{-1}. \end{equation} \tag{ 1 }$

This value, the number of planets per bin in the population, is listed in Table 4 as a function of period, for the planets around bright FGK target stars.

Table 4. Period Distribution in Population

P	n_p	N_p	$\frac{N_p/N_s}{\Delta \ln P}$
(Range)
0.63, 1.00	5	19.8	0.00120
1.00, 1.58	2	7.8	0.00047
1.58, 2.51	13	94.9	0.00574
2.51, 3.98	29	274.1	0.01658
3.98, 6.31	52	613.3	0.03710
6.31, 10.0	46	672.5	0.04068
10.0, 15.8	59	1302.1	0.07877
15.8, 25.1	51	1457.0	0.08814
25.1, 39.8	38	1503.1	0.09093
39.8, 63.1	27	1361.6	0.08237
63.1, 100	13	959.5	0.05804
100, 158	10	877.6	0.05309
158, 251	5	624.3	0.03777
251, 398	4	793.5	0.04800
Total	354

Notes. For bright FGK stars (K_p < 14), in each period range (Column 1), the table lists the number of planets in the sample n_p (Column 2), the inferred number of planets in the population N_p (Column 3), and the corresponding number of planets per star per bin width in log P.

Download table as: ASCII Typeset image

8. FREQUENCY AND RADIUS VERSUS SPECTRAL TYPE IN THE POPULATION

The original question, "what is the frequency of planets in the target population?," can now be addressed. To minimize the effect of biases in the data sample, only bright Kepler stars are considered, and of those, only ones with planet periods less than 42 days. There are sufficient data to break down the planets by radius into the terrestrial, ice giant, and gas giant groups discussed above. And the spectral types of stars are broken down into F, G, and K groups, also discussed above. For each of these nine sub-groups, the number of planets in the population can be estimated by assigning a projection factor 1/p_t to each observed planet in the sample, and summing over the projected estimates, using Equation (1). The resulting numbers of planets in the sample, n_p, and in the population, N_p, are tabulated in Table 5. The total number of stars with transits n_s in the sample and the number of stars in the population N_s are also listed. The bottom row in the table gives the sums of entries above in each case.

Table 5. Planet and Star Numbers in Sample and Population

SpTy	n_p(terr)	N_p(terr)	n_p(ice)	N_p(ice)	n_p(gas)	N_p(gas)	n_p	N_p	N_s
F	46	1017	53	1697	11	620	110	3334	11819
G	65	1317	96	3586	12	278	173	5181	14997
K	29	739	36	1179	7	138	72	2056	9080
FGK	140	3073	185	6462	30	1036	355	10571	35896

Notes. Column 1 is the spectral type of host star, Columns 2 and 3 are the number of terrestrial planets in the sample and population, Columns 4 and 5 are similar for ice giants, Columns 6 and 7 are similar for gas giants, Columns 8 and 9 are the numbers of planets in the sample and population, and Column 10 is the number of stars in the population (i.e., bright Kepler stars with periods less than 42 days). The bottom row is for the sum of all three spectral types.

Download table as: ASCII Typeset image

As a simple check, note that the N_p entries are roughly a factor of 30 larger than the n_p entries, which are appropriate because the average transit probability is roughly $\bar{p_t} \sim 1/30$ . However the numerical value depends on the exact orbit and star size, so the factor varies from one system to another. As a direct result of this variation, it should be expected that some global quantities will be different depending on whether it is the sample or the population that is being considered. As an example, the relative number of terrestrial planets in the sample is 140/355 ≃ 0.394, whereas this ratio in the population is 3073/10571 ≃ 0.291, which is significantly smaller; the latter value will be needed in Section 11.

The same data are displayed as percentage ratios of planets to stars, e.g., N_p(terr)/N_s, etc., in Table 6. The error bars in this table are derived from the Poisson statistics of the n_p values, i.e., $(N_p/N_s)\cdot \sqrt{n_p}/n_p$ . The actual errors will be larger, owing to the fluctuations expected from the projection process as applied to a small sample, as discussed above. Nevertheless, it is of interest to draw some tentative conclusions from Table 6, although these may change as more Kepler data become available.

Table 6. Planet and Star Types in Population

SpTy	$\frac{N_p({\rm terr})}{N_s}$	$\frac{N_p({\rm ice})}{N_s}$	$\frac{N_p({\rm gas})}{N_s}$	$\frac{N_p({\rm all})}{N_s}$
	(%)	(%)	(%)	(%)
F	9 ± 1	14 ± 2	5 ± 2	28 ± 3
G	9 ± 1	24 ± 2	2 ± 1	35 ± 3
K	8 ± 2	13 ± 2	2 ± 1	23 ± 3
FGK	9 ± 1	18 ± 1	3 ± 1	29 ± 2

Notes. Column 1 is the spectral type of host star, Columns 2–4 are the ratios (%) of planets (terrestrial, ice giant, and gas giant) to stars (F, G, K, and FGK) in the population, including uncertainties, and Column 5 is the ratio (%) of all planets to each star type. Data are from bright Kepler stars with periods less than 42 days.

Download table as: ASCII Typeset image

One conclusion is that the fraction of stars with terrestrial-radius planets (and in short P < 42 days) is approximately the same for F, G, and K stars, at about 9%. On the other hand the fraction of ice giant planets varies by nearly a factor of two, being about 14% for F an K stars, but 24% for G stars; if this trend holds for longer-period planets, it may be a clue about planetary origin and evolution. Finally, for the gas giants, the fraction of stars with these planets, again in P < 42 day orbits, is a rapidly dropping function of spectral type, going from 5% around F stars to 2% around K stars; since it is conceivable that giant planets may tend not to form around lower-mass stars, this too will be of interest to follow as more data become available.

The last column of Table 6 shows that the number of all planets (in short orbits) per star is roughly constant at about 29%, independent of spectral type, so short-period planets are a relatively common phenomenon.

9. PERIOD DISTRIBUTION MODEL

It is useful to have a parameterized model of the frequency of occurrence of planets, as functions, for example, of host-star spectral type, and planetary mass, radius, and period. A model can facilitate comparison with theories of the evolution of planetary systems, and also, as in this paper, for estimating the frequency of planets beyond the current range of measurements.

For the present data set, the lack of correlation between radius and period in Figure 4 suggests that the frequency distribution in terms of radius is independent of the frequency distribution in terms of period. Also, the approximately constant value of planet frequency with respect to spectral type (Section 8) suggests a possible lack of correlation here too. Thus a model in which the distribution function is represented by a product of functions of radius and period, respectively, seems appropriate.

The essentially monotonic increase in the estimated number of planets in the population, with increasing logarithmic period, in Table 4, suggests that a power law in period could be an appropriate model. Using f(P) to denote the ratio of planets to stars, or essentially the average number of planets per star, a power law of the form

$\begin{equation} \frac{df}{d\ln P} = A P^\beta \end{equation} \tag{ 2 }$

or equivalently

$\begin{equation} \frac{df}{dP} = A P^{\beta - 1} \end{equation} \tag{ 3 }$

seems appropriate.

To fit this trial law to the data at hand, the data first need to be cast into an appropriate form, as follows. The data points to be fit are those from a discrete version of df/dln P, written here as Δf/Δln P. Using data from Table 4, the discrete number of planets in the population in the ith bin, ΔN_p(i), divided by the number of target stars, N_s, can be written as

$\begin{equation} \Delta f(i) = \frac{\Delta N_p(i)}{N_s}. \end{equation} \tag{ 4 }$

The basis number of bright target stars is N_s = 35, 896, from Section 4.3. The Δln P term can be written as

$\begin{equation} \Delta \ln P(i) = \ln P_{i+1} - \ln P_i = 0.4609, \end{equation} \tag{ 5 }$

which in the present case is constant for all intervals. Thus, the data to be fitted to the model are the values of Δf/Δln P in each period bin; these values are listed in Table 4.

The data are fitted by taking the logarithm of both sides of Equation (2) to cast the model in the form of a linear equation y = a + bx, and a weighted least-squares algorithm used to obtain the coefficients, where the weights are obtained from the uncertainties in the number of planets per bin (n) in the sample, so $\Delta N = N \Delta n / n = N/\sqrt{n}$ . The six fitted bins are from rows 4 to 9 in Table 4, i.e., those with periods greater than about 3 days, given that there is an apparent fall-off in numbers below this point (Section 5), and those with periods less than about 42 days, given that the database is not complete above this point (Section 5). The fitted parameters are

$\begin{equation} A = 10^{-1.99 \pm 0.09} \simeq 0.0103 \pm 0.0022 \end{equation} \tag{ 6 }$

and

$\begin{equation} \beta = 0.71 \pm 0.08. \end{equation} \tag{ 7 }$

The reduced chi-square value is χ_red² = 10.75/(6 − 2) = 2.7 which suggests that the data have more uncertainty than given by Poisson statistics and/or the model is not optimum; at this point, only more data will help resolve these issues.

The df/dln P data and model results are plotted in Figure 5, where a thick line indicates the model over the fitted period range, and extensions of the model to shorter and longer periods are shown as thinner dashed lines. Horizontal error bars indicate the widths of the individual bins, and vertical error bars indicate the Poisson uncertainties, mentioned above. It is clear that there is nominal agreement between the data and model and that the degree of complexity of the model (two parameters) as well as its functional form appear to be appropriate for the data at hand. Future data will be absolutely crucial in determining the robustness of the present model.

10. HABITABLE ZONE

There is general agreement that the HZ is defined as the planet–star distance range within which liquid water can exist on a planet's surface. The surface temperature of a planet is a function of stellar luminosity, albedo, greenhouse effect, eccentricity, obliquity, rotation rate, and geologic age. Of these, only the first parameter can be estimated for the Kepler planets. To encompass the effect of the remaining parameters, this paper adopts three ranges that have been proposed to date, summarized in the first three columns of Table 7, and all specified for the case of the Sun.

Table 7. Habitable-zone Periods

HZ Type	Characteristic	a_☉ Range	P(F) Range	P(G) Range	P(K) Range
		(AU)	(days)	(days)	(days)
Case 1	Wide	0.72–2.00	297–1377	267–1238	228–1057
Case 2	Nominal	0.80–1.80	348–1176	313–1057	267–903
Case 3	Narrow	0.95–1.67	451–1050	405–944	346–807

Notes. Columns 1 and 2 list the case number and one-word description of the three types of HZ in this paper, Column 3 gives the Sun–planet separation range for each Case, and Columns 4–6 give the corresponding period ranges for FGK stars.

Download table as: ASCII Typeset image

Case 1, a "wide" HZ, 0.72–2.00 AU, covers a generous range of semi-major axis values, from Venus (0.72 AU) to Mars (1.52 AU) and beyond, since Venus may have had liquid water at one time, before it entered a runaway greenhouse phase, and because Mars almost certainly had liquid water at one time. With a more effective greenhouse, a planet even farther from the Sun, out to 2.0 AU, also may have had liquid water. This is the range recommended to the Kepler team in order "to be sure not to exclude planets that could conceivably be habitable" (J. Kasting 2011, private communication).

Case 2, a "nominal" HZ, 0.80–1.80 AU, is somewhat more restrictive, with an inner edge between Venus and Earth, but with the outer edge still slightly beyond Mars, reflecting less extreme assumptions than the first case. This is the range that was recommended for the TPF-C project (Levine et al. 2006) and is "a 'best bet' estimate for the HZ" (J. Kasting 2011, private communication).

Case 3, a "narrow" HZ, 0.95–1.67 AU, tightens up even more on the previous cases, reflecting a more conservative view. This is the range that will give "a lower limit on η_⊕, so that you are sure to build your TPF telescope big enough" (J. Kasting 2011, private communication).

The corresponding orbital periods are estimated as follows. For circular orbits the HZ distances a_☉(in) and a_☉(out) are scaled with stellar luminosity L as a ∼ L^0.5. For non-solar stars the luminosity is modeled to vary as L ∼ M^3.8, where M is stellar mass. From Kepler's law, P² ∼ a³/M, which, after substituting, gives

$\begin{equation} P = 365.25 \ M^{2.35}\, a_\odot ^{1.5}. \end{equation} \tag{ 8 }$

Here P is in days, M is in solar masses, and a_☉(AU) is the inner or outer edge of the HZ for the three cases listed. The median star masses in the Kepler database are 1.13 (F), 1.08 (G), and 1.01 (K). The resulting period ranges for each case and spectral type are listed in Table 7.

11. ETA-SUB-EARTH

The average number of planets per star (f₂ − f₁) in a period interval (P₁, P₂), in the power-law model of Equation (2), is obtained by integration, giving

$\begin{equation} f_2 - f_1 = \frac{A}{\beta }\big(P_2^\beta - P_1^\beta \big). \end{equation} \tag{ 9 }$

To specialize to terrestrial planets, this should be multiplied by ρ_⊕, the ratio of terrestrial planets (r = (0.5, 2.0)) to all planets, where

$\begin{equation} \rho _\oplus = \frac{N_p({\rm terr})}{N_p} \simeq 0.291 \end{equation} \tag{ 10 }$

for short-period planets around bright stars, from Section 8 and Table 5. Thus the average number of terrestrial planets per star, in the population, as a function of spectral type and HZ range, is η_⊕(SpTy, HZ), where

$\begin{equation} \eta _\oplus (\hbox{SpTy, HZ}) = \rho _\oplus \cdot (f_2 - f_1) = \rho _\oplus \cdot \frac{A}{\beta } \cdot \big(P_2^\beta - P_1^\beta \big). \end{equation} \tag{ 11 }$

The estimated values are given in Table 8, where the range is from a low of 22% to a high of 47%. To be clear, these estimates are based on projecting the total of all planets around all bright stars in the database, then simply applying the terrestrial fraction for short periods to the longer HZ periods; the individual spectral classes were not fitted, only the sum was fitted.

Table 8. Terrestrial HZ Planets in Population

HZ Type	η_⊕(F)	η_⊕(G)	η_⊕(K)
Case 1	0.47	0.44	0.39
Case 2	0.37	0.34	0.31
Case 3	0.27	0.25	0.22

Notes. Column 1 is the HZ case number as described in the text. Columns 2–4 give the expected number of terrestrial-radius planets, per star, in the HZ, for each spectral type.

Download table as: ASCII Typeset image

To obtain a single value for the number of planets per star, averaged over spectral class and HZ size, the entries in Table 8 are averaged to give

$\begin{equation} \eta _\oplus \simeq 0.34 \pm 0.14, \end{equation} \tag{ 12 }$

where the uncertainty is from the combination of the scatter in table entries (0.09) and the projection error in the model (0.11). Thus about one-third of all stars are expected to have a terrestrial-radius planet in the star's HZ.

12. DISCUSSION

The projected power law is shown in Figure 5 as the dashed line labeled "a," based on the best information currently available from Kepler. On the other hand, if the advice of B2011 is ignored, and the implied populations with periods greater than 42 days are taken seriously, then the dashed line "b" would be relevant instead. Since line "b" lies about a factor of 30 below line "a," the corresponding value of η_⊕ would drop from 34% to about 1.1%.

Recently Catanzarite & Shao (2011) estimated η_⊕ ≃ (1 − 3)%, using the same B2011 database, but making the fundamentally different assumption that periods greater than 42 days are as valid as shorter ones. Also, they did not compensate for the bias against small planets around faint stars. Their assumptions stand in marked contrast to those in the present paper. Their assumptions also disagreed with the statements in B2011 that the data are not complete beyond 42 days, since those longer periods were looked for in the range beyond the first 136 days in an ad hoc, i.e., not a systematic and complete fashion. The large difference, (1 − 3)% versus (34 ± 14)%, illustrates why it would be valuable if the Kepler mission could be extended in time, so as to be able to make measurements in the HZ range of periods, bypassing the current need to extrapolate to these periods.

Another point of comparison might be to ask for the value of df/dln P for terrestrial planets in the HZ in the solar system. Taking P₁ = 224 days for Venus, and P₂ = 686 days for Mars, and assuming that the encompassed three planets effectively have two planets between these limits, and assuming that every star has such a planet system, a frequency value of

$\begin{equation} \eta _\oplus ({\rm SS}) \simeq (df/d\ln P)_{{\rm SS}} \approx (3-1)/\ln (686/224) \simeq 1.8 \end{equation} \tag{ 13 }$

is found. Thus the Kepler value of about five times smaller is not too surprising, especially considering that, for short-period planets, there is only about one planet for every three or four stars (i.e., 1/0.29 ≃ 3.4) in the population (Section 8). This comparison also suggests that the projected line "a" in Figure 5 is consistent with a density of planets per individual star that does not exceed dynamical limits, given that the inner solar system is believed to be dynamically stable.

13. CONCLUSIONS

In the current Kepler database (B2011), transits with periods less than 42 days for bright, "Sun-like" FGK target stars are analyzed in order to estimate the frequency of terrestrial, HZ planets in the target population, giving η_⊕ ≃ (34 ± 14)%. The quoted uncertainty is the formal error in projecting the numbers of short-period planets. The true uncertainty will remain unknown until Kepler observations of orbital periods in the 1000 day range become available.

I thank the Kepler team for providing such abundant and precise data and for helpful comments on this paper. I thank the staff at the Computation Facility of the Harvard-Smithsonian Center for Astrophysics. Finally, I thank the referees, Jim Kasting and anonymous, who made especially useful comments, and who therefore had a key influence on the final version of this paper. Part of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration.

TERRESTRIAL, HABITABLE-ZONE EXOPLANET FREQUENCY FROM KEPLER

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

ABSTRACT

1. INTRODUCTION

2. THE SAMPLE DATABASE

3. BIAS ESTIMATION

4. RADIUS DISTRIBUTION IN THE SAMPLE

4.1. Radius Bias: Magnitude Dependence

4.2. Radius Bias: Mid-size Planets

4.3. Radius Bias: Bright versus Faint Stars

5. PERIOD DISTRIBUTION IN THE SAMPLE

6. PERIOD–RADIUS SCATTER DIAGRAM

7. PERIOD DISTRIBUTION IN THE POPULATION

8. FREQUENCY AND RADIUS VERSUS SPECTRAL TYPE IN THE POPULATION

9. PERIOD DISTRIBUTION MODEL

10. HABITABLE ZONE

11. ETA-SUB-EARTH

12. DISCUSSION

13. CONCLUSIONS

TERRESTRIAL, HABITABLE-ZONE EXOPLANET FREQUENCY FROM KEPLER

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

Dates

ABSTRACT

1. INTRODUCTION

2. THE SAMPLE DATABASE

3. BIAS ESTIMATION

4. RADIUS DISTRIBUTION IN THE SAMPLE

4.1. Radius Bias: Magnitude Dependence

4.2. Radius Bias: Mid-size Planets

4.3. Radius Bias: Bright versus Faint Stars

5. PERIOD DISTRIBUTION IN THE SAMPLE

6. PERIOD–RADIUS SCATTER DIAGRAM

7. PERIOD DISTRIBUTION IN THE POPULATION

8. FREQUENCY AND RADIUS VERSUS SPECTRAL TYPE IN THE POPULATION

9. PERIOD DISTRIBUTION MODEL

10. HABITABLE ZONE

11. ETA-SUB-EARTH

12. DISCUSSION

13. CONCLUSIONS