Skip to main content
Log in

Intelligent and independent processes for overcoming big graphs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Poor locality as a natural property of graph data structures causes enormous amount of network traffic in large-scale distributed graph processing systems. Moreover, data transmission through the network is one of the most expensive operations in a distributed system. Therefore, reduction of network usage is highly required by a new graph computational model. In this paper, increasing the degree of machine independency has been considered a key factor of network traffic reduction. The proposed system benefits from a three-layered computational model to perfectly leverage the power of local information as much as possible. Moreover, this model simultaneously takes the advantages of both message-based and shared-state communication paradigms. Vertices can read and update values of others in the lowest layer directly, while they must send messages in other layers. By the use of memorization techniques, the proposed model introduces a new kind of intelligence that has encouraging effects on removing useless communications. Distinctive results of our experiments confirm significant improvements of the proposed model in relation to the previous systems like Pregel, GPS, and Blogel, as well as ExPregel. The results also show that the overhead of making processes independent along with intelligent is negligible in comparison with the cost of additional network communications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Kimura M et al (2010) Extracting influential nodes on a social network for information diffusion. Data Min Knowl Discov 20(1):70–97

    Article  MathSciNet  Google Scholar 

  2. Ma H et al (2008) Mining social networks using heat diffusion processes for marketing candidates selection. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, Napa Valley, California, USA, pp 233–242

  3. Saito K et al (2012) Efficient discovery of influential nodes for SIS models in social networks. Knowl Inf Syst 30(3):613–635

    Article  Google Scholar 

  4. Becchetti L et al (2006) Link-based characterization and detection of web spam. In: AIRWeb

  5. Castillo C et al (2007) Know your neighbors: Web spam detection using the web topology. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM

  6. Arabnia HR, Oliver MA (1987) Arbitrary rotation of raster images with SIMD machine architectures. In: Computer Graphics Forum. Wiley Online Library

  7. Golmohammadi K, Zaiane OR (2012) Data mining applications for fraud detection in securities market. In: Intelligence and Security Informatics Conference (EISIC), 2012 European

  8. Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–192

    Article  Google Scholar 

  9. Wani MA (2003) Arabnia HR Parallel edge-region-based segmentation algorithm targeted at reconfigurable multiring network. J Supercomput 25(1):43–62

    Article  MATH  Google Scholar 

  10. Arabnia HR (1995) A distributed stereocorrelation algorithm. In: Computer Communications and Networks, 1995. Proceedings., Fourth International Conference on. : IEEE

  11. Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th Annual International High Performance Computing Conference

  12. Malewicz G et al (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, Indianapolis, Indiana, USA, pp 135–146

  13. McCune RR, Weninger T, Madey G Thinking Like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing

  14. Batarfi O et al (2015) Large scale graph processing systems: survey and an experimental evaluation. Clust Comput 18(3):1189–1213

    Article  Google Scholar 

  15. Han M et al (2014) An experimental comparison of pregel-like graph processing systems. Proce VLDB Endow 7(12):1047–1058

    Article  Google Scholar 

  16. Sagharichian M, Naderi H, Haghjoo M (2015) ExPregel: a new computational model for large-scale graph processing. Concurr Comput Pract Exp 27(17):4954–4969

    Article  Google Scholar 

  17. Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–269

    Article  MATH  Google Scholar 

  18. Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor—theoretical properties and algorithms. Parallel Comput 21(11):1783–1805

    Article  Google Scholar 

  19. Bhandarkar SM, Arabnia HR (1995) The Hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114

    Article  Google Scholar 

  20. Gregor D, Lumsdaine A (2005) The parallel BGL: a generic library for distributed graph computations. Parallel Object-Oriented Scientific Computing (POOSC)

  21. Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Philadelphia, PA, USA, pp 631–636

  22. Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. ACM, Melbourne, Australia, pp 390–403

  23. Kang U, Tsourakakis CE, Faloutsos C (2009) Pegasus: a peta-scale graph mining system implementation and observations. In: Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on.: IEEE

  24. Kang U et al (2008) Hadi: fast diameter estimation and mining in massive graphs with hadoop. Carnegie Mellon University. School of Computer Science, Machine Learning Department

  25. Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. In: Computer graphics forum. Wiley Online Library

  26. Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitised images. Comput J 30(5):425–432

    Article  Google Scholar 

  27. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  28. Jain N, Liao G, Willke TL (2013) GraphBuilder: scalable graph ETL framework. In: First International Workshop on Graph Data Management Experiences and Systems. ACM, New York, pp 1–6

  29. Rastogi V et al (2012) Finding connected components on map-reduce in logarithmic rounds. arXiv preprint arXiv:1203.5387

  30. Srirama SN, Jakovits P, Vainikko E (2012) Adapting scientific computing problems to clouds using MapReduce. Future Gener Comput Syst 28(1):184–192

    Article  Google Scholar 

  31. Shao B, Wang H, Li Y (2012) The trinity graph engine. Technical Report 161291, Microsoft Research

  32. Chen R et al (2010) Large graph processing in the cloud. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, Indianapolis, Indiana, USA, pp 1123–1126

  33. Apache Giraph. Available from: http://giraph.apache.org/

  34. Apache Hama. Available from: http://hama.apache.org/

  35. Salihoglu S, Widom J (2012) Gps: a graph processing system

  36. Cai Z, Logothetis D, Siganos G (2012) Facilitating real-time graph mining. In: Proceedings of the Fourth International Workshop on Cloud Data Management. ACM, Maui, Hawaii, USA, pp 1–8

  37. Kalnis P et al (2012) Mizan: optimizing graph mining in large parallel systems

  38. Bao NT, Suzumura T (2013) Towards highly scalable pregel-based graph processing platform with \(\times \)10. In: Proceedings of the 22nd International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee: Rio de Janeiro, Brazil, pp 501–508

  39. Krepska E et al (2011) HipG: parallel processing of large-scale graphs. SIGOPS Oper Syst Rev 45(2):3–13

    Article  Google Scholar 

  40. Low Y et al (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727

    Article  Google Scholar 

  41. Low Y et al (2011) Graphlab: a distributed framework for machine learning in the cloud. arXiv preprint arXiv:1107.0922

  42. Che S (2014) GasCL: a vertex-centric graph model for GPUs. In: IEEE High Performance Extreme Computing Conference (HPEC)

  43. Che S, Beckmann BM, Reinhardt SK (2014) BelRed: constructing GPGPU graph applications with software building blocks In: High Performance Extreme Computing Conference (HPEC), 2014 IEEE, Waltham, MA, pp 1–6. doi:10.1109/HPEC.2014.7040961

  44. Zhong J, He B (2013) Medusa: simplified graph processing on GPUs. IEEE Trans Parallel Distrib Syst 99:1–1

  45. Roy A, Mihailovic I, Zwaenepoel W (2013) X-Stream: edge-centric graph processing using streaming partitions, In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, Farminton, Pennsylvania, pp 472–488

  46. Yuan P et al (2014) Fast iterative graph computation: a path centric approach. In: High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for IEEE

  47. Xie W et al (2013) Fast iterative graph computation with block updates. Proc VLDB Endow 6(14):2014–2025

    Article  Google Scholar 

  48. Yan D et al (2014) Blogel: a block-centric framework for distributed computation on real-world graphs. Proc VLDB Endow 7(14)

  49. Simmhan Y et al (2014) GoFFish: a sub-graph centric framework for large-scale graph analytics. In: Silva F, Dutra I, Santos Costa V (eds) Euro-Par 2014 Parallel Processing. Springer International Publishing, pp 451–462

  50. Chen R et al (2012) Improving large graph processing on partitioned graphs in the cloud. In: Proceedings of the Third ACM Symposium on Cloud Computing. ACM, San Jose, California, pp 1–13

  51. Yang S et al (2012) Towards effective partition management for large graphs. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, Scottsdale, Arizona, USA, pp 517–528

  52. Tian Y et al (2013) From “Think Like a Vertex” to “Think Like a Graph”. Proc VLDB Endow 7(3)

  53. Wang G et al (2013) Asynchronous large-scale graph processing made easy. In: CIDR

  54. Han M et al (2014) An experimental comparison of pregel-like graph processing systems. Proc VLDB Endow 7(12):1047–1058

    Article  Google Scholar 

  55. Stanford Large Network Dataset Collection (2014) Available from: http://snap.stanford.edu/data/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hassan Naderi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sagharichian, M., Naderi, H. Intelligent and independent processes for overcoming big graphs. J Supercomput 73, 1438–1466 (2017). https://doi.org/10.1007/s11227-016-1834-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1834-4

Keywords

Navigation