doi:10.1016/j.peptides.2008.05.022
Copyright © 2008 Elsevier Inc. All rights reserved.
Massive peptide sharing between viral and human proteomes
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Darja Kanduca,
,
,
, Angela Stufanoa, Guglielmo Lucchesea and Anthony Kusalikb
aDepartment of Biochemistry and Molecular Biology, University of Bari, Bari 70126, Italy
bDepartment of Computer Science, University of Saskatchewan, Saskatoon, Canada
Received 29 February 2008;
revised 28 May 2008;
accepted 30 May 2008.
Available online 5 June 2008.
Abstract
Thirty viral proteomes were examined for amino acid sequence similarity to the human proteome, and, in parallel, a control of 30 sets of human proteins was analyzed for internal human overlapping. We find that all of the analyzed 30 viral proteomes, independently of their structural or pathogenic characteristics, present a high number of pentapeptide overlaps to the human proteome. Among the examined viruses, human T-lymphotropic virus 1, Rubella virus, and hepatitis C virus present the highest number of viral overlaps to the human proteome. The widespread and ample distribution of viral amino acid sequences through the human proteome indicates that viral and human proteins are formed of common peptide backbone units and suggests a fluid compositional chimerism in phylogenetic entities canonically classified distantly as viruses and Homo sapiens. Importantly, the massive viral to human peptide overlapping calls into question the possibility of a direct causal association between virus–host sharing of amino acid sequences and incitement to autoimmune reactions through molecular recognition of common motifs.
Keywords: Viral proteomes; Human proteome; Sequence similarity; Peptide sharing; Autoimmunity
Fig. 1. Viral 5-mer occurrences in the human proteome as a function of viral proteome length (see data under Table 4, columns 1 and 4). The symbols refer to: (*), HTLV-1; (■), Rubella virus; (□), HCV; (+), other viral data point. The regression line (—) has an equation of y = 12.636x − 269.01, with a Pearson correlation coefficient (r value) of 0.97452. Both x- and y-axis are log scale.
Fig. 2. Viral versus human pentapeptide overlapping: similarity profile of low-, medium- and high-molecular weight viral proteomes to the human proteome. (A) Human parvovirus B19 (total aa: 2006); (B) SARS coronavirus BJ202 (total aa: 14,209); (C) Variola virus (total aa: 54,289).
Fig. 3. Human versus human pentapeptide overlapping: similarity profile of low-, medium- and large-sized artificial human sub-proteomes to the human proteome. Human sub-proteome set (see also Table 2): (A) 4 (total aa: 2142); (B) 26 (total aa: 14,203); (C) 29 (total aa: 54,263).
Table 1.
Description of the viral proteomes analyzed for similarity to human proteins

Table 2.
Size descriptions of the 30 artificial human sub-proteomes used as controls
a Amino acid number.
b Composed by set of human proteins as detailed under Section
2 and in
Table 3.
Table 3.
List of the human proteins forming the 30 artificial human sub-proteomes used as controls

Table 4.
Viral versus human proteome overlap at the 5-mer level

Human proteome formed by 36,103 proteins and 15,771,565 occurrences of 2,388,563 unique 5-mers. Column number refers to: (1) unique 5-mers in the viral proteome; (2) total number of 5-mers in the viral proteome (including multiple occurrences); (3) unique viral 5-mers occurring in the human proteome; (4) viral overlap occurrences in the human proteome (including multiple occurrences); (5) number of human proteins involved in overlap; (6) % of unique viral 5-mers which occur in the human proteome (i.e. 100 × column 3/column 1).
a Abbreviations as in
Table 1.
b The results of linear regression analysis between columns 1 and 3, and 1 and 4 are: column 3 = 0.91811 × column 1 − 1.2272 (
r = 0.99993). Column 4 = 12.636 × column 1 − 269.01 (
r = 0.97452).
c Obtained by combining all 30 viral proteomes into one viral proteome, and then computing the overlap with the entire human proteome.
Table 5.
Human versus human proteome overlap at the 5-mer level

All 30 artificial human sub-proteomes constitute 686 proteins and are numbered from 1 to 30. The comparison human proteome contained 36,103 proteins and 15,771,565 occurrences of 2,388,563 unique 5-mers. Column number refers to: (1) unique 5-mers in the artificial sub-proteome; (2) total number of 5-mers in the artificial sub-proteome (including multiple occurrences); (3) unique 5-mers from the artificial sub-proteome occurring in the human proteome; (4) occurrences in the human proteome of 5-mers from artificial sub-proteome (including multiple occurrences); (5) number of human proteins in the human proteome involved in overlap; (6) % of unique 5-mers from the artificial sub-proteome which occur in the human proteome (i.e. 100 × column 3/column 1).
a Analogous to viral proteomes in size (see
Table 1), and composed by set of human proteins as detailed in
Table 3.
b The results of linear regression analysis between columns 1 and 3, and 1 and 4 are: column 3 = 0.97083 × column 1 + 0.76628 (
r = 0.99998). Column 4 = 17.921 × column 1 + 6278.6 (
r = 0.99719).
c Obtained by combining all 30 human sub-proteomes into one sub-proteome, and then computing the overlap with the entire original human proteome minus the proteins in the combined sub-proteomes.
Table 6.
Viral versus human proteome overlap at the 6-mer level

Human proteome formed by 36,103 proteins and 15,734,725 occurrences of 8,247,275 unique 6-mers. Column number refers to: (1) unique 6-mers in the viral proteome; (2) total number of 6-mers in the viral proteome (including multiple occurrences); (3) unique viral 6-mers occurring in the human proteome; (4) viral overlap occurrences in the human proteome (including multiple occurrences); (5) number of human proteins involved in overlap; (6) % of unique viral 6-mers which occur in the human proteome (i.e. 100 × column 3/column 1).
a Abbreviations as in
Table 1.
b The results of linear regression analysis between columns 1 and 3, and 1 and 4 are: column 3 = 0.31426 × column 1 − 42.237 (
r = 0.98762). Column 4 = 1.0972 × column 1 − 877.68 (
r = 0.93373).
c Obtained by combining all 30 viral proteomes into one viral proteome, and then computing the overlap with the entire human proteome.
Table 7.
Viral versus human proteome overlap at the 7-mer level

Human proteome formed by 36,103 proteins and 15,697,964 occurrences of 10,431,975 unique 7-mers. Column number refers to: (1) unique 7-mers in the viral proteome; (2) total number of 7-mers in the viral proteome (including multiple occurrences); (3) unique viral 7-mers occurring in the human proteome; (4) viral overlap occurrences in the human proteome (including multiple occurrences); (5) number of human proteins involved in overlap; (6) % of unique viral 7-mers which occur in the human proteome (i.e. 100 × column 3/column 1).
a Abbreviations as in
Table 1.
b The results of linear regression analysis between columns 1 and 3, and 1 and 4 are: column 3 = 0.042176 × column 1 − 36.723 (
r = 0.95637); column 4 = 0.17329 × column 1 − 410.74 (
r = 0.84397).
c Obtained by combining all 30 viral proteomes into one viral proteome, and then computing the overlap with the entire human proteome.
Table 8.
Actual versus theoretical n-peptide occurrences in viral and human proteins

Column number: (1) actual unique n-mers in the 30 viral proteomes; (2) actual n-mers in the 30 viral proteomes (including multiple occurrences); (3) number of repeated n-mers in the 30 viral proteomes; (4) actual unique n-mers in the 30 human sub-proteomes; (5) actual n-mers in the 30 human sub-proteomes (including multiple occurrences); (6) number of repeated n-mers in the 30 human sub-proteomes; (7) unique viral n-mers overlaps in the 30 human sub-proteomes; (8) total viral n-mers overlaps in the 30 human sub-proteomes (including multiple occurrences); (9) unique viral n-mers overlaps in the human proteome; (10) total viral n-mers overlaps in the human proteome (including multiple occurrences).
a Peptide length, with
n from 5 to 16 amino acids.
b The number of possible amino acid combinations is given by 20
n.
Corresponding author. Tel.: +39 080 544 3321; fax: +39 080 544 3321.