Skip to main content

Large Scale Graph Representations for Subgraph Census

  • Conference paper
  • First Online:
Book cover Advances in Network Science (NetSci-X 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9564))

Included in the following conference series:

  • 932 Accesses

Abstract

A Subgraph Census (determining the frequency of smaller subgraphs in a network) is an important computational task at the heart of several graph mining algorithms. Here we focus on the g-tries, an efficient state-of-the art data structure. Its algorithm makes extensive use of the graph primitive that checks if a certain edge exists. The original implementation used adjacency matrices in order to make this operation as fast as possible, as is the case with most past approaches. This representation is very expensive in memory usage, limiting the applicability. In this paper we study a number of possible approaches that scale linearly with the number of edges. We make an extensive empirical study of these alternatives in order to find an efficient hybrid approach that combines the best representations. We achieve a performance that is less than \(50\,\%\) slower than the adjacency matrix on average (almost 3 times more efficient than a naive binary search implementation), while being memory efficient and tunable for different memory restrictions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at https://github.com/ComplexNetworks-DCC-FCUP/gtrieScanner/tree/DynamicGraph.

  2. 2.

    Available at https://github.com/ComplexNetworks-DCC-FCUP/gtrieScanner/tree/finalGraph.

References

  1. Albert, I., Albert, R.: Conserved network motifs allow protein-protein interaction prediction. Bioinformatics 20(18), 3346–3352 (2004)

    Article  Google Scholar 

  2. Batagelj, V., Mrvar, A.: Pajek datasets (2006). http://vlado.fmf.uni-lj.si/pub/networks/data/

  3. Cook, S.A.: The complexity of theorem-proving procedures. In: ACM Symposium on Theory of computing STOC, pp. 151–158. ACM, New York, USA (1971)

    Google Scholar 

  4. Fellbaum, C.: WordNet. Wiley Online Library (1998)

    Google Scholar 

  5. Gleiser, P.M., Danon, L.: Community structure in jazz. Adv. Complex Syst. 06(04), 565–573 (2003)

    Article  Google Scholar 

  6. Grochow, J.A., Kellis, M.: Network motif discovery using subgraph enumeration and symmetry-breaking. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS (LNBI), vol. 4453, pp. 92–106. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Khakabimamaghani, S., Sharafuddin, I., Dichter, N., Koch, I., Masoudi-Nejad, A.: Quatexelero: an accelerated exact network motif detection algorithm. PLoS ONE 8(7), e68073 (2013)

    Article  Google Scholar 

  8. Klimt, B., Yang, Y.: Introducing the enron corpus. In: CEAS (2004)

    Google Scholar 

  9. Leskovec, J., Mcauley, J.J.: Learning to discover social circles in ego networks. In: Advances in Neural Information Processing Systems, pp. 539–547 (2012)

    Google Scholar 

  10. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)

    Article  Google Scholar 

  11. Oliveira Aparicio, D., Pinto Ribeiro, P.M., Da Silva, F.M.A.: Parallel subgraph counting for multicore architectures. In: 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp. 34–41. IEEE (2014)

    Google Scholar 

  12. Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: Generalizing degree and shortest paths. Soc. Netw. 32(3), 245–251 (2010)

    Article  Google Scholar 

  13. Paredes, P., Ribeiro, P.: Towards a faster network-centric subgraph census. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 264–271. IEEE (2013)

    Google Scholar 

  14. Ribeiro, P., Silva, F.: Efficient subgraph frequency estimation with G-tries. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 238–249. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  15. Ribeiro, P., Silva, F.: G-tries: a data structure for storing and finding subgraphs. Data Min. Knowl. Disc. 28(2), 337–377 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  16. Richardson, M., Agrawal, R., Domingos, P.: Trust management for the semantic web. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 351–368. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  17. Sporns, O., Kötter, R.: Motifs in brain networks. PLoS Biol. 2(11), e369 (2004)

    Article  Google Scholar 

Download references

Acknowledgements

This work is partially funded by FCT, within project UID/EEA/50014/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Paredes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Paredes, P., Ribeiro, P. (2016). Large Scale Graph Representations for Subgraph Census. In: Wierzbicki, A., Brandes, U., Schweitzer, F., Pedreschi, D. (eds) Advances in Network Science. NetSci-X 2016. Lecture Notes in Computer Science(), vol 9564. Springer, Cham. https://doi.org/10.1007/978-3-319-28361-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28361-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28360-9

  • Online ISBN: 978-3-319-28361-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics