Horizontal Transfer, Not Duplication, Drives the Expansion of Protein Families in Prokaryotes
Figure 6
Protein family construction pipeline.
Starting with a databank of proteins, we first performed all pairwise similarity searches using BLASTP. The hits were filtered regarding the length of the match (70% of the length of the query) and the bitscore (30% of the maximal bitscore calculated by aligning a protein against itself). To build the gene families we ran MCL blastline and then removed all singletons, IS and Phage. To build the core genome we used OrthoMCL along with a synteny filter based on M-GCAT Clusters. Finally, using presence/absence and phylogenetic information, we obtained the protein families with expansions