Abstract
With the advent of high-throughput sequencing methods, new ways of visualizing and analyzing increasingly amounts of data are needed. Although some software already exist, they do not scale well or require advanced skills to be useful in phylogenetics.
The aim of this thesis was to implement three community finding algorithms – Louvain, Infomap and Layered Label Propagation (LLP); to benchmark them using two synthetic networks – Girvan-Newman (GN) and Lancichinetti-Fortunato-Radicchi (LFR); to test them in real networks, particularly, in one derived from a Staphylococcus aureus MLST dataset; to compare visualization frameworks – Cytoscape.js and D3.js, and, finally, to make it all available online (mscthesis.herokuapp.com).
Louvain, Infomap and LLP were implemented in JavaScript. Unless otherwise stated, next conclusions are valid for GN and LFR. In terms of speed, Louvain outperformed all others. Considering accuracy, in networks with well-defined communities, Louvain was the most accurate. For higher mixing, LLP was the best. Contrarily to weakly mixed, it is advantageous to increase the resolution parameter in highly mixed GN. In LFR, higher resolution decreases the accuracy of detection, independently of the mixing parameter. The increase of the average node degree enhanced partitioning accuracy and suggested detection by chance was minimized. It is computationally more intensive to generate GN with higher mixing or average degree, using the algorithm developed in the thesis or the LFR implementation. In S. aureus network, Louvain was the fastest and the most accurate in detecting the clusters of seven groups of strains directly evolved from the common ancestor.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
The world’s most valuable resource is no longer oil, but data. The Economist, 6 May 2017. economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data. Accessed 28 May 2019
Barabási, A.-L.: Network science book. networksciencebook.com. Accessed 15 May 2019
Patil, H.G.S., Babu, A.N., Ramkumar, P.S.: Non-invasive data acquisition and measurement in bio-medical technology: an overview. In: Maximizing Healthcare Delivery and Management through Technology Integration. IGI Global (2016)
Health. European Data Protection Supervisor. edps.europa.eu/data-protection/our-work/subjects/health_en. Accessed 9 June 2019
Ten threats to global health in 2019. World Health Organization. who.int/emergencies/ten-threats-to-global-health-in-2019. Accessed 4 June 2019
Antimicrobial resistance. World Health Organization, 15 February 2018. who.int/en/news-room/fact-sheets/detail/antimicrobial-resistance. Accessed 29 May 2019
Memish, Z.A., Venkatesh, S., Shibl, A.M.: Impact of travel on international spread of antimicrobial resistance. Int. J. Antimicrob. Agents 21(2), 135–142 (2003)
Top 10 Leading Causes of Death Globally. theatlas.com/charts/HkLaDreuW. Accessed 12 May 2019
Ribeiro-Gonçalves, B., Francisco, A.P., Vaz, C., Ramirez, M., Carriço, J.A.: PHYLOViZ online: web-based tool for visualization, phylogenetic inference, analysis and sharing of minimum spanning trees. Nucleic Acids Res. 44(1), 246–251 (2016)
Motro, Y., Moran-Gilad, J.: Next-generation sequencing applications in clinical bacteriology. Biomol. Detect. Quantif. 14, 1–6 (2017)
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. P10008, 12 (2008)
Rosvall, M., Axelsson, D., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Top. 178(1), 13–23 (2009)
Raghavan, N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3) (2007). 25th Anniversary Milestones
Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: WWW 2011 Proceedings of the 20th International Conference on World Wide Web (2011)
Šubelj, L.: Label propagation for clustering. In: Advances in Network Clustering and Blockmodeling. Wiley, New York (2018)
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Nat. Acad. Sci. U.S.A. 99(12), 7821–7826 (2002)
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E: Stat. Nonlin. Soft Matter Phys. 78(4) (2008)
Lancichinetti, A., Fortunato, S., Kertesz, J.: Detecting the overlapping and hierarchical community structure of complex networks. New J. Phys. 11, 033015 (2009)
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: Proceedings of 2012 IEEE International Conference on Data Mining (ICDM) (2012)
Zachary, W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1976)
Staphylococcus aureus MLST Databases. PubMLST, 5 June 2019. pubmlst.org/saureus/. Accessed 5 June 2019
Lancichinetti, A., Fortunato, S.: Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E: Stat. Nonlin. Soft Matter Phys. 80, 016118 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Rita, L., Francisco, A., Carriço, J., Borges, V. (2020). Community Finding with Applications on Phylogenetic Networks. In: Henriques, J., Neves, N., de Carvalho, P. (eds) XV Mediterranean Conference on Medical and Biological Engineering and Computing – MEDICON 2019. MEDICON 2019. IFMBE Proceedings, vol 76. Springer, Cham. https://doi.org/10.1007/978-3-030-31635-8_234
Download citation
DOI: https://doi.org/10.1007/978-3-030-31635-8_234
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31634-1
Online ISBN: 978-3-030-31635-8
eBook Packages: EngineeringEngineering (R0)