BMMI-tree: A Peer-to-Peer m-ary tree using 1-m node splitting for an efficient multidimensional complex query search

https://doi.org/10.1016/j.jpdc.2018.09.018Get rights and content

Highlights

  • BMMI-tree is an extension of m-ary tree overlay framework BATON* that also gives support to MI techniques.

  • All the data indexing operations of the application domain performed in O(logmN) cost, m>2.

  • Two approaches viz. quadratic split and m-combination split proposed for 1-m node splitting.

  • Only single index constructed for an object having multiple dimensions as opposed to BATON*.

  • Detailed analysis of range query search by varying size of range query window.

Abstract

Peer-to-Peer (P2P) applications such as content distribution and sharing (like file, audio, video), multiuser communication (games, desktop sharing, e-learning) have emerged as a new paradigm over a last decade. However, scalability requirements remain a major concern and hence, the distribution and effective search of multidimensional data have become major challenges for P2P computing. Most of the existing P2P overlays either do not give support to Multidimensional Indexing (MI) or the frameworks are less efficient for complex query search or they are limited up to binary trees only, with the search complexity O(log2N). However, traditional MI based on m-ary tree is strengthened for the complex query search (bound to logmN) using higher fanout, m > 2. Based on these observations, we propose BMMI-tree (Balanced Multiway Multidimensional Indexing-tree) that uses an m-ary P2P tree overlay network and also provides the support of MI tree indexing methods such as R-tree or SS-tree in this paper. The paper also analyzes the complex query search algorithms performed in O(logmN) steps with the experimental results. In addition, the construction of the P2P tree network requires to split some existing node and its data objects into m new child nodes (during node join) and vice versa (during node leave). To the best of our knowledge, none of the existing node splitting algorithms for multiway multidimensional trees offer 1-m node splitting. Hence, in this paper, we also propose two different approaches to split the MI tree node into m number of nodes (m-ary split) to be used effectively to create a dynamic tree overlay. Lastly, we present how the BMMI-tree can be applied for service provisioning in cloud computing in a decentralized and distributed manner.

Introduction

Peer-to-Peer (P2P) systems have become popular for the massively distributed applications that are becoming prominent nowadays. Higher scalability, dynamicity, self-organizing capabilities, fault-tolerance, symmetric communication i.e., each peer/node being identical and independent, multi-purpose resource sharing i.e., each peer acting as a server, client and router and locality of data are the factors that make P2P systems essential for numerous applications [1], [3], [23], [28], [47]. The P2P is now extended to the applications like intelligent resource discovery [31], streaming media and telephony [29], mobile P2P networks [32], multi player interactive gaming [13], video content managing [24], P2P in cloud computing for service sharing [26], [49] etc. The researchers recently have focused on P2P overlay networks (logical networks built on the top of the existing network, generally work at application layer) supporting Multidimensional Indexing (MI) to deal with the multidimensional queries and applications efficiently and effectively [19], [47].

The P2P overlay networks are generally classified as unstructured or structured networks [23]. The nodes in the unstructured overlays do not have any prior knowledge of the topology and hence, they are loosely coupled with each other [6], [7], [14], [21]. However, they suffer from the poor routing efficiency due to use of flooding or random walks or replication of the data. Hence, the structured overlay networks are developed where the nodes are organized in a tightly coupled manner with a predefined topology of the network. The structured overlay networks effectively use traditional search techniques like Distributed Hash Tables (DHTs), skip graphs or hierarchical tree structures. In addition, the working of the P2P overlay systems are generally divided into two parts viz. system domain that organizes the peers, establishes various links between peers and maintains the network topology and application domain that maps and partitions the data space to the identifier space using global space partitioning strategies, for efficient data insertion or deletion and execution of the search queries [8].

The system domain of the structured P2P based on DHTs [9], [11], [27], [30], [34], [48] can handle exact queries efficiently. However, it may not be well suited for complex queries viz. range, similarity or kNN. This is because, hashing is not useful in maintaining ordering information and hence, the locality of the data [16]. The structured P2P extending skip graphs [12], [33], [45] have search complexities bound to O(log2N), however, it can be improved further by increasing the logarithmic base using multiway structure. Hence, to overcome these limitations, the structured P2P overlay networks based on hierarchical tree structures viz. BATON [16], P-tree [5], NR-tree [22], DP-tree [20], DHR-tree [43], BATON* [15], VBI-tree [17], Distributed Quadtree [39], CAN-QTree [50], EZSearch [40], m-LIGHT [38], SDI-tree [46], P2PM-tree [41], AVL MI-tree [18], HD-tree [8] have been proposed. However, from the literature survey, it is observed that the current tree indexing approaches in P2P overlay networks suffer from various issues as discussed in next subsection, leading to inefficient searching in tree overlays.

Distribution and indexing multidimensional objects in a highly scalable, decentralized and dynamic P2P network is a major challenge in a P2P computing. Due to this, P2P overlay networks face the problems in solving complex query processing issues of real-world applications as follows:

  • The application domain deals with one dimensional objects only similar to B-tree [5], [16], [20] or uses replications of one-dimensional overlays with search performance depending on the frequency of attributes in the queries [15] and hence, efficient MI support is desirable to handle multidimensional P2P applications.

  • The system as well as application domain supporting MI has limited bound on search complexities of O(log2N) using binary trees [17], [18], [39], [40], [46] or O(N12) [50].

  • The search cost in terms of the number of routing hops or number of contacted nodes is increased with the increased dimensionality [22], [41].

  • The tree topology defined in system domain may not be height balanced [38], [50].

  • In addition, when a new node joins the tree structured networks, either (i) it is assigned data objects from its parent/sibling node [15], [16] directly which may lead to imbalanced load distribution or (ii) two new child nodes are created along with the new node and they are assigned the data objects by splitting data of some existing node (1–2 node split) [17], [46]. However, this cannot be applied for splitting the data amongst m new nodes in m-ary trees.

From the above observations, we believe that there is a need to develop a P2P multiway (fanout, m > 2) tree overlay network, that supports insertion, deletion and search capabilities bound in terms of logmN with higher fanout. In addition, it can utilize traditional MI methods based on space containment relationship such as R-tree [10], SS-tree [44] in order to support complex query processing that is independent of the frequency of the attributes or dimensions in the search queries.

Another challenge is when a new node joins the network, the node being replaced and its data objects are required to be split to create m new nodes. To the best of our knowledge, none of the existing node splitting algorithms for MI trees offers 1-m node splitting (m-ary split).

We propose a BMMI-tree (Balanced Multiway Multidimensional Indexing-tree) with two different approaches for m-ary split in this paper. Thus, the BMMI-tree improves the application as well as system domain in existing approaches with the characteristics as follows:

  • BMMI-tree is an extension of m-ary tree overlay framework BATON* [15]; that also gives support to MI techniques based on the space containment relationships such as R-tree [10], SS-tree [44] in this multiway tree overlay. Thus, it strengthens the application domain where MI is independent of the frequency of the attributes (as dimensions) in the search queries as opposed to BATON*.

  • All the data indexing operations of the application domain (including the complex query search) can be performed in O(logmN) routing hops, where fanout m > 2. Thus, the storage and search operations of the application domain are independent of the dimensionality of the multidimensional objects.

  • The BMMI-tree overlay network stores the routing information and data indexing information in routing and data nodes respectively. This enhances the overall performance of the P2P tree network with each node satisfying their individual goals.

  • In addition, we also propose two approaches viz. quadratic split and m-combination split for 1-m node splitting (m-ary node splitting) in the system domain. Both of them differ in the selection of the m initial seeds. This also maintains the height balancing property of the tree topology such that each node has exactly ‘m’ child nodes.

  • BMMI-tree supports higher fanout, hence, fault tolerance is improved considerably as compared to the binary tree overlays.

  • Lastly, the detailed analysis of the range query search is carried out for different dimensions, with user churns and by varying size of the range query window (from 10% to 100%). The experimental results show that the BMMI-tree outperforms the existing binary trees for multidimensional objects.

The BMMI-tree is useful in sharing and efficiently searching multimedia contents through social networking, where very large amount of high dimensional data need to be searched over distributed network. Not only that, the BMMI-tree can be applied in cloud computing for efficient service provisioning in a decentralized manner. The required service can be discovered in O(logmN) number of messages, where N is the number of services with multiple attributes in a cloud computing system. Thus, the BMMI-tree overlay is constructed to have the advantages as discussed in Table 1 as compared to BATON*.

The rest of the paper is organized as follows: Section 2 discusses about the related work and their limitations. The overlay network structure of the BMMI-tree is proposed in Section 3. Section 4 introduces the system domain of the BMMI-tree with the modified node join algorithm and new 1-m node splitting approaches. The application domain with various data indexing and search operations in O(logmN) cost bound are proposed and analyzed in Section 5. Section 6 discusses the tuning of parameters affecting the network topology. Various experimental results are discussed in Section 7 followed by the conclusions and future research directions in Section 8.

Section snippets

Related work

The research in P2P using hierarchical tree structures explores various traditional binary or m-ary trees [5], [8], [15], [16], [17], [18], [20], [22], [38], [39], [40], [41], [43], [46], [50]. The majority of the P2P tree networks use binary tree structure. Distributed Quadtree [39] uses MX-CIF quadtree to partition the space to make every spatial object marked with several control points. A spatial object or a spatial query can be indexed or searched by peers (in Chord manner) using these

The BMMI-tree overlay network

The structure of the BMMI-tree modifies the information stored in the nodes of the tree as compared to BATON*. In BATON*, all the peers (nodes) are responsible for maintaining routing information as well as storing data information in linear order. Instead, these two functionalism are divided among two different types of nodes viz. data node and routing node in the structure of the BMMI-tree. The rationale of separating the functionalism into two different nodes is to give support to the

The system domain of BMMI-tree

The system domain of the BMMI-tree connects the routing nodes of the overlay network structure using various links as shown in Fig. 1. Each routing node maintains: (1) link to the parent node, (2) links to m child nodes, (3) links to the adjacent nodes, (4) links to the selected neighbor nodes in sideways routing tables (left routing table and right routing table) (5) Upside table (UT) and (6) Minimum Bounding Rectangle (MBR) [36]. The sideways routing tables store the links of the selected

The application domain of BMMI-tree

The application domain of the BMMI-tree follows the centralized MI scheme based on the R-tree. However, it can be constructed using any other MI based on the space containment relationship. We show an example of BMMI-tree representation for R-tree [10] and SS-tree [44] in Fig. 2. The indexes of the data objects are stored in the data nodes and MI uses routing nodes to find out the appropriate data node for insertion or search of the data objects. In centralized MI methods, the point or range

Size of nodes and tuning of affecting parameters

The search performance and the cost of update operations of BMMI-tree are dependent on two vital parameters viz. the fanout of the tree (m) and the dimensionality of the data object (d) [36]. In addition, variations of these parameters directly affect the space complexities of the routing and the data nodes. The number of entries in the left and right routing tables and the UT are different for the different levels of a node. For the maximum level, left and right routing table sizes are bound

Experimental results

We implement the BMMI-tree using R-tree [10] in a PeerSim simulator [25]. The dynamic tree is constructed with 1000 to 10,000 nodes in the network. 5000 to 50,000 data objects are inserted respectively for the network sizes, in a multidimensional space. After that, 1000 point queries and 1000 range queries are executed with these network setups. Next, a dynamic network is constructed up to 10,000 node joins. To simulate the user churns, it is assumed that 10% of the existing nodes leave the

Conclusion

The BMMI-tree combines P2P height-balanced m-ary tree overlay network and MI methods based on the space containment relationships to achieve the complex query search of the multidimensional objects bound to O(logmN) in a decentralized and dynamic P2P systems. As is reflected from our experimental results, the average number of routing hops for the modified node algorithms with various network sizes in BMMI-tree is superior as compared to that in the binary tree like VBI-tree. We propose two m

Shivangi Surati is an Assistant Professor in the Department of Computer Engineering at LDRP Institute of Technology and Research, Gandhinagar, Gujarat, India. She received her M.E. Computer degree from Birla Vishwakarma Mahavidyalaya, Anand, Gujarat, India in 2006. She has completed Ph.D. in 2017 from the Department of Computer Engineering at the Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat, India. Her research interests include peer-to-peer networks, data mining and

References (50)

  • BeckmannN. et al.

    The R*-tree: an efficient and robust access method for points and rectangles

  • P. Ciaccia, M. Patella, P. Zezula, M-tree: An efficient access method for similarity search in metric spaces, in: VLDB...
  • A. Crainiceanu, P. Linga, J. Gehrke, J. Shanmugasundaram, Querying peer-to-peer networks using p-trees, in: WebDB ’04:...
  • Freenet, Website, URL...
  • Gnutella, Website, URL...
  • GuptaA. et al.

    Meghdoot: Content-based publish/subscribe over P2P networks

  • GuttmanA.

    R-trees: a dynamic index structure for spatial searching

  • HarrenM. et al.

    Complex queries in dht-based peer-to-peer networks

  • HarveyN.J. et al.

    Skipnet: A Scalable Overlay Network with Practical Locality PropertiesTech. Rep. MSR-TR-2002-92

    (2002)
  • Incentives build robustness in bittorrent, Website, URL...
  • JagadishH.V. et al.

    Speeding up search in peer-to-peer networks with a multi-way tree structure

  • H.V. Jagadish, B.C. Ooi, Q.H. Vu, BATON: a balanced tree structure for peer-to-peer networks, in: VLDB ’05: Proceedings...
  • H. Jagadish, B.C. Ooi, Q.H. Vu, R. Zhang, A. Zhou, VBI-tree: A peer-to-peer framework for supporting multi-dimensional...
  • L. Jin-ling, Z. Hong, Study of the AVL-tree index range query based on p2p networks, in: ICEE ’10: Proceedings of the...
  • KingR.A. et al.

    Query routing and processing in peer-to-peer data sharing systems

    Int. J. Database Manage. Syst.

    (2010)
  • Shivangi Surati is an Assistant Professor in the Department of Computer Engineering at LDRP Institute of Technology and Research, Gandhinagar, Gujarat, India. She received her M.E. Computer degree from Birla Vishwakarma Mahavidyalaya, Anand, Gujarat, India in 2006. She has completed Ph.D. in 2017 from the Department of Computer Engineering at the Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat, India. Her research interests include peer-to-peer networks, data mining and advanced algorithms. She is a life time member of CSI and ISTE.

    Prof Devesh C Jinwala has been working as a Professor in Computer Engineering at the Department of Computer Engineering, S V National Institute of Technology, Surat, India since 1991. His principal research areas of interest are broadly Security, Cryptography, Algorithms and Software Engineering. Specifically his work focuses on Security and Privacy Issues in Resource-constrained environments (Wireless Sensor Networks) and in Data Mining, Attribute-based Encryption techniques, Requirements Specification, and Ontologies in Software Engineering. He has been/is the Principal Investigator of several sponsored research projects funded by ISRO, GUJCOST, Govt of Gujarat and DiETY-MCIT-Govt of India.

    Sanjay Garg is a Professor in Computer Engineering Department at the Institute of Technology, Nirma University, Ahmedabad, Gujarat, India. He received his B.E., M.E. and Ph.D. degrees in Computer Science and Engineering in the year 1991, 2001 and 2008 respectively. He has published more than 30 research papers in the international/national journals/conferences. Currently he is supervising 6 Doctoral theses. His area of interest is in Data Mining, Database Systems and Formal Systems. He is a Senior Member of IEEE and Member of ACM.

    View full text