Skip to main content
Log in

M-Grid: a distributed framework for multidimensional indexing and querying of location based data

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The widespread use of mobile devices and the real time availability of user-location information is facilitating the development of new personalized, location-based applications and services (LBSs). Such applications require multi-attribute query processing, scalability for supporting millions of users, real-time querying capability and analyzing large volumes of data. Cloud computing aided a new generation of distributed databases commonly known as key-value stores. Key-value stores were designed to extract values from very large volumes of data while being highly available, fault-tolerant and scalable, hence providing much needed infrastructure to support LBSs. However, complex queries over multidimensional data cannot be processed efficiently as they do not provide means to access multiple attributes. In this paper, we present M-Grid, a unifying indexing and a data distribution framework which enables key-value stores to support multidimensional queries. We organize a set of nodes in a modified P-Grid overlay network which provides efficient data distribution, fault-tolerance and query processing over multidimensional data. To index, we use Hilbert Space Filling Curve based linearization technique which preserves the data locality to efficiently manage multidimensional data in a key-value store. We propose algorithms to dynamically process range and k nearest neighbor (kNN) queries on linearized values. This removes the overhead of maintaining a separate index table. Our approach is completely independent from the underlying storage layer and can be implemented on any cloud infrastructure. Our experiments on Amazon EC2 show that M-Grid achieves a performance improvement of three orders of magnitude in comparison to MapReduce and four times to that of MD-HBase scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. We could not evaluate the performance of MD-HBase for all the experiments as the authors have only published results for 3-d dataset on a 4 nodes cluster size except for insert throughput experiment.

References

  1. http://www.itu.int/

  2. Union, I.T.: The world in 2015: Ict facts and figures. [Online]. Available: https://www.itu.int/en/ITU-D/Statistics/Documents/facts/ICTFactsFigures2015.pdf (2015)

  3. McMahon, M., Steketee, C.: Investigation of proposed applications for lbs enabled mobile handsets. In:ICMB ’06. International Conference on Mobile Business, 2006, pp. 26–26 (2006)

  4. [Online]. Available: http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/index.htm

  5. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’84, pp. 47–57 (1984)

  6. Finkel, R., Bentley, J.: Quad trees a data structure for retrieval on composite keys. Acta Informatica 4, 1–9 (1974)

    Article  MATH  Google Scholar 

  7. http://www.ibm.com/software/products/en/db2spaext

  8. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, vol. 7, ser. OSDI ’06, pp. 15–15 (2006)

  9. http://hbase.apache.org/

  10. http://cassandra.apache.org/

  11. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  12. Hilbert, D.: Ueber stetige abbildung einer linie auf ein flashenstuck. Mathematishe annalen 32, 459–460 (1893)

    Google Scholar 

  13. Aberer, K., Cudré-Mauroux, P., Datta, A., Despotovic, Z., Hauswirth, M., Punceva, M., Schmidt, R.: P-grid: a self-organizing structured p2p system. SIGMOD Rec. 32, 29–33 (2003)

    Article  Google Scholar 

  14. Nishimura, S., Das, S., Agrawal, D., Abbadi, A.E.: Md-hbase: design and implementation of elastic infrastructure for cloud-scale location services. Distrib. Parallel Databases 31, 289–319 (2014)

    Article  Google Scholar 

  15. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41, 205–220 (2007)

    Article  Google Scholar 

  16. Cooper, B.F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.-A., Puz, N., Weaver, D., Yerneni, R.: Pnuts: Yahoo!’s hosted data serving platform. Proc. VLDB Endow. 2(1), 1277–1288 (2008)

    Article  Google Scholar 

  17. Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, ser. SOSP ’03, pp. 29–43 (2003)

  18. http://hadoop.apache.org/

  19. Wu, S., Wu, K.-L.: An indexing framework for efficient retrieval on the cloud. IEEE Data Eng. Bull. 32(1), 75–82 (2009)

    MathSciNet  Google Scholar 

  20. Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C.: Indexing multi-dimensional data in a cloud system. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’10, pp. 591–602 (2010)

  21. Zhang, X., Ai, J., Wang, Z., Lu, J., Meng, X.: An efficient multi-dimensional index for cloud data management. In: Proceedings of the First International Workshop on Cloud Data Management, ser. CloudDB ’09, pp. 17–24 (2009)

  22. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  23. Ding, L., Qiao, B., Wang, G., Chen, C.: An efficient quad-tree based index structure for cloud data management. In: Web-Age Information Management. Lecture Notes in Computer Science, vol. 6897, pp. 238–250 (2011)

  24. Suprio Ray, R.B., Goel, A.K.: Supporting location-based services in a main-memory database. In: Proceedings of the IEEE International Conference on Mobile Data Management (MDM), (2014)

  25. Jagadish, H., Ooi, B.-C., Vu, Q.H., Zhang, R., Zhou, A.: Vbi-tree: A peer-to-peer framework for supporting multi-dimensional indexing schemes. In: Data Engineering, 2006. in: ICDE ’06 Proceedings of the 22nd International Conference on, pp. 34–34 (2006)

  26. Li, F., Chen, R., Zhou, C., Zhang, M.: A novel geo-spatial image storage method based on hilbert space filling curves. In: 2010 18th International Conference on Geoinformatics, pp. 1–4 (2010)

  27. Pavanakumar, M., Kaushik, K.: Revisiting the space-filling curves for storage, reordering and partitioning mesh based data in scientific computing. In: 2013 20th International Conference on High Performance Computing (HiPC), pp. 362–367 (2013)

  28. Hu, C., Zhao, Y., Wei, X., Du, B., Huang, Y., Ma, D., Li, X.: Actgis: A web-based collaborative tiled geospatial image map system. In: 2010 IEEE Symposium on Computers and Communications (ISCC), pp. 521–528 (2010)

  29. Butz, A.R.: Alternative algorithm for hilbert’s space-filling curve. IEEE Trans. Comput. 20, 424–426 (1971)

    Article  MATH  Google Scholar 

  30. Bially, T.: Space-filling curves: their generation and their application to bandwidth reduction. IEEE Trans. Inf. Theory 15(6), 658–664 (1969)

    Article  Google Scholar 

  31. Hamilton, C., Rau-Chaplin, A.: Compact hilbert indices for multi-dimensional data. In: First International Conference on Complex, Intelligent and Software Intensive Systems, 2007. CISIS 2007, pp. 139–146 (2007)

  32. Morton, G.: A computer oriented geodetic data base and a new technique in file sequencing. International Business Machines Company, [Online]. Available: http://books.google.com/books?id=9FFdHAAACAAJ (1966)

  33. Gray, F.: Pulse code communication. (1953)

  34. Moon, B., Jagadish, H., Faloutsos, C., Saltz, J.: Analysis of the clustering properties of the hilbert space-filling curve. Knowl. Data Eng. IEEE Trans. 13, 124–141 (2001)

    Article  Google Scholar 

  35. Abel, D.J., Mark, D.M.: A comparative analysis of some two-dimensional orderings. Int. J. Geogr. Inf. Syst. 4, 21–31 (1990)

    Article  Google Scholar 

  36. Mokbel, M.F., Aref, W.G., Kamel, I.: Performance of multi-dimensional space-filling curves. In: Proceedings of the 10th ACM International Symposium on Advances in Geographic Information Systems, ser. GIS ’02, pp. 149–154 (2002)

  37. Clarke, I., Sandberg, O., Wiley, B., Hong, T.W.: Freenet: A distributed anonymous information storage and retrieval system. In: International Workshop on Designing Privacy Enhancing Technologies: Design Issues in Anonymity and Unobservability, pp. 46–66 (2001)

  38. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ser. SIGCOMM ’01, pp. 149–160 (2001)

  39. Jagadish, H.V., Ooi, B.C., Vu, Q.H.: Baton: A balanced tree structure for peer-to-peer networks. In: Proceedings of the 31st International Conference on Very Large Data Bases, ser. VLDB ’05, pp. 661–672 (2005)

  40. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. SIGCOMM Comput. Commun. Rev. 31, 161–172 (2001)

    Article  MATH  Google Scholar 

  41. Rowstron, A.I.T., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg, ser. Middleware ’01, pp. 329–350 (2001)

  42. Crainiceanu, A., Linga, P., Machanavajjhala, A., Gehrke, J., Shanmugasundaram, J.: P-ring: An efficient and robust p2p range index structure. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’07, pp. 223–234 (2007)

  43. Aberer, K., Datta, A., Hauswirth, M., Schmidt, R.: Indexing data-oriented overlay networks. In: Proceedings of the 31st International Conference on Very Large Data Bases, ser. VLDB ’05, pp. 685–696 (2005)

  44. Datta, A., Hauswirth, M., John, R., Schmidt, R., Aberer, K.: Range queries in trie-structured overlays. In: Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing, ser. P2P ’05, pp. 57–66 (2005)

  45. Rosch, P., Sattler, K., von der Weth, C., Buchmann, E.: Best effort query processing in dht-based p2p systems. In: 21st International Conference on Data Engineering Workshops, 2005, pp. 1186–1186 (2005)

  46. Lawder, J.K.: Querying multi-dimensional data indexed using the hilbert space-filling curve. SIGMOD Rec. 30, 2001 (2001)

    Article  Google Scholar 

  47. https://code.google.com/p/uzaygezen/

  48. Tang, Y., Xu, J., Zhou, S., Lee, W.-C., Deng, D., Wang, Y.: A lightweight multidimensional index for complex queries over dhts. IEEE Trans. Parallel Distrib. Syst. 22, 2046–2054 (2011)

    Article  Google Scholar 

  49. Tanin, E., Nayar, D., Samet, H.: An efficient nearest neighbor algorithm for p2p settings. In: Proceedings of the 2005 National Conference on Digital Government Research, ser. dg.o ’05, pp. 21–28 (2005)

  50. Gao, J.: Efficient support for similarity searches in dht-based peer-to-peer systems. In: In IEEE International Conference on Communications (ICC07 (2007)

  51. Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using mapreduce. Proc. VLDB Endow. 5, 1016–1027 (2012)

    Article  Google Scholar 

  52. Stupar, A., Michel, S., Schenkel, R.: Rankreduce - processing k-nearest neighbor queries on top of mapreduce. In: In LSDS-IR, (2010)

  53. Tao, Y., Zhang, J., Papadias, D., Mamoulis, N.: An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces. IEEE Trans. Knowl. Data Eng. 16, 1169–1184 (2004)

    Article  Google Scholar 

  54. Brinkhoff, T.: A framework for generating network-based moving objects. Geoinformatica 6, 153–180 (2002)

    Article  MATH  Google Scholar 

  55. Cooper, B. F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with ycsb. In: Proceedings of the 1st ACM Symposium on Cloud Computing, ser. SoCC ’10, pp. 143–154 (2010)

Download references

Acknowledgements

This project is supported partially from an AFRL grant, and NSF Grants IIP-1238321 and CNS-1461914.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanjay Madria.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, S., Madria, S. & Linderman, M. M-Grid: a distributed framework for multidimensional indexing and querying of location based data. Distrib Parallel Databases 35, 55–81 (2017). https://doi.org/10.1007/s10619-017-7194-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-017-7194-0

Keywords

Navigation