Abstract

The limiting distribution of the size of binary interval tree is investigated. Our illustration is based on the contraction method, and it is quite different from the case in one-sided binary interval tree. First, we build a distributional recursive equation of the size. Then, we draw the expectation, the variance, and some high order moments. Finally, it is shown that the size (with suitable standardization) approaches the standard normal random variable in the Zolotarev metric space.

1. Introduction

Random trees are usually generated based on combinatorics and occur also in the context of algorithms from computer science. There are many kinds of random trees with different structures, such as recursive trees, search trees, binary trees, and interval trees. The asymptotic probability behavior of random variables in random trees has attracted more scholars’ attention and has become a popular research area. Drmota [1] introduced some labelled and unlabelled random trees in his book. Devroye and Janson [2] studied the protected nodes in several random trees. Feng and Hu [3] researched the phase changes of scale-free trees. The limiting law for the height, size, and subtree of binary search trees was also considered (see [46]). There were also some researchers investigating the Zagreb index and nodes of random recursive trees (see [79]).

Binary interval tree is a random structure that underlies the process of random division of a line interval and parking problems. It has recently been a popular subject. Sibuya and Itoh [10] showed that the number of internal and external nodes in different levels of binary internal tree is asymptotically normal, from which the asymptotic normality of the size of the tree could not be achieved directly. Prodinger [11] looked into various parameters of the incomplete trie, a one-sided version of a random tree with a digital flavor. Fill et al. [12] followed with a study of the nonexistence of limit distribution for the height of the incomplete trie. Itoh and Mahmoud [13] considered five incomplete one-sided variants of binary interval trees and proved that their sizes all approach some normal random variables. Janson [14] drew the same result for a larger scale of one-sided interval trees by the renewal theory, and one kind of fragmentation trees was discussed by Janson and Neininger [15]. Javanian et al. [16] investigated the paths in m-ary interval trees. Su et al. [17] studied the complete binary interval trees and got the Law of Large Numbers. In addition, Pan et al. [18] considered the construction algorithm about binary interval trees.

The binary interval tree is a tree associated with repeated divisions of a line interval of length . The process of divisions is as follows. If , there is no division in effect; the associated interval tree consists only of one terminal node. Supposing that , we begin with the interval . Divide the interval into two subintervals by choosing , a point uniformly distributed over the interval . Then, we get two intervals, and . Each of the two subintervals is further divided at a uniform point of its length, and two smaller subintervals are got as before. If the length of the subinterval is less than 1, we stop the division. Repeat this process until the length of every interval (or subinterval) is less than 1.

We take , for instance. Figures 1(a) and 1(b) show how the above random division process of interval generates a binary interval tree.

If some different conditions are added and those intervals satisfying the conditions are not allowed to be divided (see [13, 14]), then we can get different incomplete interval trees. In particular, if we only divide one subinterval of every interval, then the interval tree we get is the so-called one-sided interval tree (see [13]).

It is obvious that interval tree could embody many properties of random division, so it can elicit lots of valuable subjects related to probability. For example, for , the height of the interval tree is the greatest level of all subintervals after the divisions, denoted by ; the total number of nodes of an interval tree is the total number of intervals that were got from the random division process, and so on. Let be the size of the interval trees, that is, the total number of nodes of the binary interval trees. Our intention is to investigate the random variable , the size of binary interval trees.

In this paper, the central limit theorem of the size of binary interval trees is investigated. In view of the difficulty to calculate the moment generating function of , the method we used is completely different from that in the case of one-sided interval trees. In Section 2, we build a distributional recursive equation of and give the expectation, the variance, and some high order moments of . In Section 3, via the contraction method, the limit law of is shown to approach the unique solution of a fixed-point distributional equation in the Zolotarev metric space. Finally, we demonstrate that , with suitable standardization, converges to a normal limiting random variable, as .

2. The Moments of

Compared with the one-sided interval trees, the properties of binary interval trees are much more complex. There are a lot of difficulties when it comes to obtaining the moment generating function of . Therefore, the method used in the case of one-sided interval trees (see [13]) is no longer applicable. Here, we build a distributional recursive equation of . We can calculate the expectation and the variance of . Furthermore, we find that the order of the fourth central moment of is as goes to infinity.

From the definition of binary interval tree, it is easy to see that and , for . For our purpose to investigate the case of , let denote the point chosen uniformly from interval ; hence, . For any fixed real number , if , we denote to be the size of the left subtree associated with the interval . Correspondingly, denotes the size of the right subtree associated with the interval . According to the rule of division, we can see that and are mutually independent. Thus, we have This formula implies that if is given, has the same distribution as . Obviously, we can rewrite the above formula asDefine It is easy to see that

From the distributional recursive equation (2) and the above boundary conditions, Su et al. [17] calculated the expectation and the variance , for any .

Lemma 1. Let be the size of a binary interval tree. Then

Lemma 2. Let be the size of a binary interval tree. Then

In order to prove that the asymptotic distribution of is normal, we also need the order of as . The following proposition shows the fourth central moment of .

Proposition 3. Let be the size of a binary interval tree. Then

Proof. See the appendix.

3. The CLT for

In this section, we will prove the asymptotic normality of as . The main method is the contraction method and some metrics are needed especially the Zolotarev metrics (see [19]).

First we introduce the Zolotarev metrics. Denote the distribution of the random variable by . Let be the set of the distributions of all real random variables, and define

It can be verified that random variable with satisfies the following formula. For any , and more generally, we have the following lemma.

Lemma 4. If and are standard normal random variables, is uniformly distributed over interval , and are mutually independent and then one has

Proof. In fact, for any , we have Therefore, But, we can find that, in the set , there is only one distribution, the standard normal , satisfying (10).

Suppose that is a nonnegative integer. Denote by the set of all real functions that are times continuous and differentiable, defined on the real line. Let where is a fixed real number. Let andand then is the Zolotarev metrics with order on the set . According to the properties of the Zolotarev metric, we know Therefore, we can choose as the metric we need on the subset (see [20, 21]); that is, ,  . This is due to the fact that, for any and , we have , but if , then .

The metric has several properties as follows (see [20]):(1)For any constant ,(2)if random variables and are mutually independent, then(3)for random variables and ,

Now, we begin to prove the main result in this paper.

Theorem 5. Let be the size of a binary interval tree. Then, as ,

Proof. Denote Then from Lemmas 1 and 2, we know thatSo, we have for and for .
According to the correlative inequality in [21], for any , where is the gamma function. Assume that the distribution of random variable is . It follows from Proposition 3 that Therefore, there exists a constant such thatDenotewhere is standard normal distribution and is standard normal random variable; then we can see thatNow, we just need to prove that ; then the theorem follows.
Suppose that ; by (A.1) and (21), we have where is the first point chosen from interval and is an independent copy of .
If we denote , then and we can rewrite the above formula asAccording to the definition of and , it could be found that for and . If we define , then we can also see that . Furthermore, for some positive constant by conditioning on and using the similar calculation in the appendix. Hence, for some positive constant .
As we had pointed out before, the standard normal distribution is the only distribution satisfying (10) in the set . From (25), (14), and Lemma 4, for , we have Given , let be small enough such that . For any fixed , when is sufficiently large, then Thus, where is the constant as before and is sufficiently large. It implies that when is sufficiently large. Therefore, From this equation and the arbitrariness of , we can conclude and immediately. By (18), the theorem holds.

Appendix

Proof of Proposition 3

From the process of generating the binary interval trees, it is obvious that, for given ,  , where is the first point chosen from interval . For , if we denote then we have We need to calculate first before we get . For , we have In view of the independence between and and that holds for any , we have It is easy to see that and when , for the part , we have Therefore, That is, Via differentiation with respect to , we get the differential equation: The solution to this differential equation iswhere is a constant real number.

Similarly, for , when , we have Because is independent of , and holds for any , we get In particular, for the part , we have When , for the part , we have where is a constant.

When , for the part , we have where is the same as that in (A.11).

When , for the part , we have Noting that and (6), we can see that where Therefore, That is, Via differentiation with respect to , we get the differential equation: The solution to this differential equation is where is a constant and the constants are real numbers as defined before. From this equation, Proposition 3 follows.

Informed consent was obtained from all individual participants included in the study.

Conflict of Interests

The authors declare that they have no conflict of interests.

Acknowledgments

The authors are most grateful to the referee and the editor for their very thorough reading of the paper and valuable suggestions, which greatly improve the original results and presentation of this paper. Jie Liu’s work was supported by the National Natural Science Foundation of China (nos. 11101394, 71471168, and 71520107002), China Postdoctoral Science Foundation Funded Project (nos. 201104312 and 20100480688), Fund for the Doctoral Program of Higher Education Foundation (no. 20113402120005), and the Fundamental Research Funds for the Central Universities of China (no. WK2040160008). Yang Yang’s work was supported by the National Natural Science Foundation of China (no. 71471090), the Humanities and Social Sciences Foundation of the Ministry of Education of China (no. 14YJCZH182), China Postdoctoral Science Foundation (nos. 2014T70449 and 2012M520964), Natural Science Foundation of Jiangsu Province of China (no. BK20131339), the Major Research Plan of Natural Science Foundation of the Jiangsu Higher Education Institutions of China (no. 15KJA110001), Qing Lan Project, PAPD, Program of Excellent Science and Technology Innovation Team of the Jiangsu Higher Education Institutions of China, Project of Construction for Superior Subjects of Statistics of Jiangsu Higher Education Institutions, and Project of the Key Lab of Financial Engineering of Jiangsu Province.