Introduction

Proteins of any pathogen that are displayed on the cell surface, membrane bound or secreted into the extracellular surrounding are primarily required for cell- cell contact between the host and the pathogen. Accessibility to the genomic sequence of a pathogen has disclosed its entire antigenic repertoire. Sequence-based ‘Reverse Vaccinology’ approaches (Rappuoli and Covacci 2003) have catalysed a major drift in development of vaccine. This approach screens entire pathogen’s genome by high-throughput in silico method and identifies targets with vaccine attributes.

Proteomic technologies (cataloguing the entire protein content of a cell) serve as an important complement to the reverse vaccinology approach to antigen discovery (Dove 1999). Advancement in protein separation technologies, in form of combination with mass spectrometry and genome sequencing, has made feasible the task of elucidation of total protein components (proteome) of any pathogen at its best. The most popular procedure in proteome analysis is 2D-gel electrophoresis in which the protein mixture is first resolved into its individual components. However, for in depth analysis and identification of low abundance proteins, gel-free separation methodologies are becoming more popular. For determination of accurate molecular masses using mass spectrometry, each protein is digested with a specific protease to generate discrete peptide fragments (Wöhlbrand et al. 2013). Matrix-Assisted Laser Desorption/Ionization (MALDI) and Electro Spray Ionization (ESI) are generally used as ion source for soft ionization of peptide molecules. Time of flight mass spectrometer (MALDI-TOF MS) or other formats are being extensively used for the comparison of the experimentally generated peptide-mass fingerprint with the in silico-generated theoretical fingerprints using a database of all proteins predicted by genome sequencing (Grandi 2001). Fragmentation of peptide precursors in tandem mass spectrometry further provides the sequence information leading to unambiguous identification of proteins and beyond.

Pathogens tend to use multiple proteins during host infection against host defence. Proteomic techniques enable identification of those proteins and their subset proteins residing on the surface of the pathogen (Walters and Mobley 2010). Vaccine targets once identified from a pathogen’s proteome can be expressed as recombinant proteins and then further tested for their immunogenicity in appropriate in vivo model (Seib et al. 2012).

Proteomics and vaccine development

Proteomics permits the study of the virulence mechanisms in pathogen on mass- scale and become a powerful tool for high-throughput analysis of proteins. It not only captures a metabolic snapshot at a particular moment in the life of a pathogen, but also provides the whole information of a protein like its localization, function, characteristic and interaction with host proteins. Comparative proteomics can expose important information on the differences between attenuated and pathogenic organisms and whether a protein is conserved among various strains (DelVecchio et al. 2010). Nowadays, proteomics is combined with two relatively emerging areas: immunomics (looking for immunogenic proteins) and vaccinomics (characterization of host response to immunization), that yields valuable information on host- pathogen interaction. It also gives a pace to the process of the identification and detailed characterization of novel antigens, having vaccine potential (Adamczyk-Poplawska et al. 2011).

Over the past decade, proteomic approaches combining 2-DE and MS have been used to systematically map the cellular, surface-associated and secreted proteins of many pathogens. Availability of complete genomic sequences and the amalgamation of proteomic, genomic and bioinformatic technologies have facilitated the identification of potential vaccine candidate and therapeutic agents (Fig. 1). 2DE proteome analysis has been performed for several bacteria and comprehensive and comparative results have been published for several bacterial species. Expression proteomic studies and elucidation of reference maps for one or more of sub-cellular fractions from cytosolic, membrane, cell surface and extra cellular proteome have been carried out for several pathogenic bacteria including Staphylococcus aureus (Cordwell et al. 2002), Bacteroides fragilis (Diniz et al. 2004), Aeromonas salmonocida (Ebanks et al. 2005), Helicobacter pylori (Backert et al. 2005), Mycobacterium tuberculosis (Mattow et al. 2003), Escherichia coli (Molloy et al. 2000), Bacillus anthracis (Huang et al. 2004), and Chlamydia pneumoniae (Molestina et al. 2002).

Fig. 1
figure 1

Schematic overview of the way in which proteomics, bioinformatics and associated high-throughput analyses applied to understand pathogen and its interactions and discover vaccine candidates

Bioinformatic selection of candidate vaccine antigen coupled with proteomic identification

Whilst empirical approaches to the selection of vaccine sub-units are still employed, bioinformatics approaches to select candidate protein sub-units from bacterial genome sequences have been used recently in the last decade. The computer aided discovery of protein antigen i.e. immunoinformatics has significantly reduced the number of laboratory tests and resources required for identification of candidate vaccine antigen. These all in silico approaches including the mining of genome sequence are cumulatively termed as reverse vaccinology (Rappuoli and Covacci 2003). Reverese vaccinology has been applied for vaccine development and a few examples in this regard include Group B meningococcus (MenB), M. tuberculosis (Mattow et al. 2003; Tsolakos et al. 2014).

Generally ‘in silico’ approaches to the identification of vaccine antigens have relied on these criteria for selection of candidate protein: (a) prediction of sub-cellular location (b) identification of antigens using sequence similarity search and (c) the use of sophisticated statistical approaches for predicting the probability of antigen characteristics.

The reductive strategy was used in B. anthracis for identification of surface antigen. The selection procedure contained combination of preliminary annotation (sequence similarity searches and domain assignments) and some additional filtering criteria like prediction of cellular localization, taxonomical and functional screen, and number of paralogs. Preliminary screening by in silico method was followed by proteomic analysis of B. anthracis membrane fraction (Ariel et al. 2003). On the basis of theory of reverse vaccinology, bioinformatics study, comparative genomic hybridization (CGH) analysis and transcriptional analysis were carried out to identify vaccine candidate in Leptospira interrogans to diagnose leptospirosis (Yang et al. 2006). Comparative genomics along with bioinformatics approach was used for identification of novel vaccine candidate in Enterohemorrhagic E. coli (García-Angulo et al. 2014).

Vaxign (He et al. 2010) and NERVE (Vivona et al. 2006) are two tools that predict vaccine candidate and protective antigen based on reverse vaccinology approach. A multi-step antigen selection approach was used initially by applying Vaxign tool to predict ORF encoding outer membrane proteins with antigenic determinants and then in silico prediction for down-selection of the candidate antigens (Gomez et al. 2013). In one of the study, five complete Corynebacterium pseudotuberculosis genomes were subjected to pan-genomic analysis to reveal in silico predicted pan- exoproteome which further helped in generating a list of putative vaccine candidate (Santos et al. 2012). Thus, vaccine informatics is emerging as a current research area that facilitates rational vaccine design using immunoinformatics and epitope prediction tool.

Exoproteome analysis for virulence determinants and vaccine discovery

As in any living cell, extracytoplasmic proteins play versatile functions, including nutrient uptake, chemo-sensing, motility, adherence, and impact on host cells. Moreover, they deliver distinguishing clues regarding the physiology and bacterial cell interaction in an ecological niche.

A pathogen must be able to evade the host immune system in order to invade, adhere, and colonisation of host cells. The factors involved in these processes are also of great importance for the understanding of pathogenesis but unfortunately they are largely unknown. The exoproteins of Gram-positive bacteria are likely to contain some of these key factors. The protein secretion systems and the secreted proteins collectively account for Secretome; in monoderm bacteria (Gram-positive cell envelope architecture), these proteins can also be found in the membrane and/or cell wall. Hence the proteins found in the extracellular milieu of Gram-positive bacteria are called extracellular proteins, or exoproteins, which constitute exoproteome; not necessarily secreted by known secretion systems.

C. perfringens exoproteome was elucidated and several proteins were identified in the exoproteome of C. perfringens type A and type C strains. Some of those exoproteins had features characteristic of virulence determinants, suggesting that in addition to the “classic” extracellular toxins, a large number of proteins may be essential for C. perfringens virulence. One of the candidate Ornithine carbamoyltransferase was also shown to be protective in mice and thus making it a potential vaccine candidate. The investigation demonstrated the strength of proteomics in the rapid elucidation of vaccine candidates and was the turning point for the identification of diagnostic markers for C. perfringens and the development of vaccines against gas gangrene in humans (Sengupta et al. 2010). Recently, from the exoproteome of C. perfringens, three putative virulence associated lipoproteins; polysaccharide deacetylase family protein, probable ion-uptake ABC transporter, and a putative lipoprotein of no known function were reported with respect to their immuno-protective potentials. The three proteins were over expressed and purified to near homogeneity and found to generate protective immune response in mice against challenge with C. perfringens (Dwivedi et al. 2015a).

The secretome of the three C. difficile strains were analyzed and compared. Proteins were separated and analyzed by means of SDS-PAGE and LC–MS. LC–MS analysis revealed 158 different proteins in the supernatant of C. difficile. Most of the identified proteins originated from the cytoplasm (Boetzkes et al. 2012). Exoproteome of another Gram positive pathogen Listeria monocytogenes was presented using amalgam of bioinformatic and experimental proteomic approaches. Proteomic analysis using both 2D gel electrophoresis/MALDI and HPLC/ESI–MS identified 105 proteins in the culture supernatant of L. monocytogenes. Among these, already known virulence determinant were detected, showing the importance of this sub-proteome. Comparative proteomic analysis between pathogenic and non-pathogenic species of Listeria was carried out to uncover proteins probably involved in pathogenicity and/or adaptation in the host (Trost et al. 2005).

Similarly, several virulence markers secreted by Streptococcus pyogenes are involved in pathogenesis. One of the important factors is peroxide regulator (PerR) which is associated with the peroxide resistance response and pathogenesis. Very little information about the regulation of PerR and its involvement in virulence is known so far. Comparative secretome analysis of a perR deficient mutant with a wild-type strain investigated the role of PerR in regulating the expression of the S. pyogenes secretome involved in virulence (Wen et al. 2011). Secretome at different time course of B. anthracis was investigated to analyze pathogenicity progression, identify therapeutics and diagnostic markers, and for the elucidation of vaccine targets (Walz et al. 2007). Moreover, secretory proteins of human pathogen Burkholderia cepacia are known to be involved in virulence and participate in host- pathogen interaction. Secretome of B. cepacia identified eighteen immunogenic proteins and might be potential vaccine candidate (Mariappan et al. 2010). More recently, exoproteome and secretome approach was employed in pathogen H. pylori to identify potential candidate vaccine for the development of multicomponent vaccine and it is also detected that exoproteome composition of the H. pylori is also growth phase dependent (Naz et al. 2015; Snider et al. 2016).

Pacheco et al. (2011) identified 93 different C. pseudotuberculosis extracellular proteins by analyzing the exoproteomes of two strains isolated from different hosts pertaining distinct virulence phenotypes under in vitro conditions. In another study, L. interrogans serovar Lai was grown in proteinfree medium to obtain its set of extracellular proteins for analysis by LC/MS. Many virulence factors were detected in the supernatant, providing new insights into the pathogenesis of leptospirosis (Zeng et al. 2013).

Surface proteome analysis for virulence determinants and vaccine discovery

Bacterial surface proteins play a fundamental role in the interaction between the bacterial cell and its environment. Surface proteins contribute to escape from the host immune attack, to biofilm formation and in mounting defence against host response. The ability of surface proteins to bind to host extracellular matrix and plasma components promotes adhesion to host tissues and invasion of cells. Hence, surface proteins are good targets of drugs while aiming towards prevention of bacterial infections and diseases. Furthermore, because surface proteins directly interact with the host immune system, they are likely to become components of effective vaccines. Vaccines based on surface-exposed and secreted proteins for some pathogens are already commercially available and others are in development.

Comparative genomic analysis of two closely related species of Clostridium pathogen-C. tetani and C. perfringens suggested many homologous ORFs in both of the genomes. This striking finding suggested an extensive sharing of common epitopes between homologous proteins of these two medically important pathogens. In one study, total cellular proteins of C. tetani were probed with antisera raised against whole cells of C. perfringens ATCC13124 (Alam et al. 2008). Cross-reactive proteins were identified by mass spectrometry. Immuno-fluorescence microscopy indicated binding of the polyclonal C. perfringens whole cell antibody to the surface of both C. perfringens and C. tetani, indicating possibility of some of the cross reactive proteins to be actually surface localized. Also, the finding provided basis for the search of potential vaccine candidates with broader coverage, encompassing more than one pathogenic clostridial species. In another report, predominant cell surface and cell envelope (structure associated) proteins were identified from C. perfringens ATCC13124 differentially expressed in cells grown on cooked meat medium (CMM) in comparison with cells grown in reference state (tryptose-yeast extract-glucose medium). Two of the surface proteins of C. perfringens, phosphoglycerate kinase and ornithine carbamoyltransferase were found immunogenic and later one was reported as protective in animal model. Three proteins phosphoglycerate kinase, N-acetylmuramoyl-l-alanine amidase, and translation elongation factor Tu and EF-G were predicted to be putative vaccine candidates on the basis of their high abundance on the cell surface and their homologs in other Gram positive pathogenic bacteria were previously shown to be immunogenic and/or virulence determinants (Alam et al. 2009). Recently, extractable proteome of total four strains from type A and type C (two strains from each type) identified a total of 134 unique proteins (529 protein spot features). Proteins showing altered expression under host-simulated conditions from virulent type A strain (ATCC13124) were also elucidated which provides a starting point to explore their potentials in detection/diagnostic and there possible role in pathogenesis of the organism (Dwivedi et al. 2015b).

Cell wall proteins from C. difficile were extracted employing two methods of extraction: (1) low pH glycine treatment and (2) lysozyme digestion of the cell wall, and analysed in detail by 2DGE and mass spectrometry. Several proteins were identified as paralogs of the surface layer proteins (SLPs), consistent with the hypothesis that these are cell wall associated proteins (Wright et al. 2005). Moreover, cell wall fraction from Streptococcus pneumoniae was extracted and approximately 150 soluble proteins were identified by 2D gel electrophoresis. Two of them were cross-reactive with unrelated strains of different serotypes and conferred protection against respiratory challenge with virulent pneumococci (Ling et al. 2004). Five potential candidate proteins from surface fractions of 40 different serotypes of S. pneumoniae were screened  for inclusion in a vaccine (Morsczeck et al. 2008).

One rapid and more direct method for the “in situ” identification of surface proteins is “shaving” the surface of intact bacteria with proteolytic enzymes, followed by identification of the released peptides by liquid chromatography (LC) and tandem mass spectrometry (MS/MS) (Rodríguez-Ortega et al. 2006). An advantage of this approach is limited contamination with intracellular proteins because the proteolytic enzymes will only have access to proteins that are exposed on the surface of the bacterial cell. “Shaving” approach is applied to identify the surface proteome of E. faecalis V583 (Bøhle et al. 2011), Group A Streptococcus (Rodríguez-Ortega et al. 2006), Group B streptococcus (Doro et al. 2009), Uropathogenic E. coli (Walters and Mobley 2009) and S. pneumoniae (Mattow et al. 2003).

Surface-exposed proteins of complete set of group A Streptococcus (GAS) were characterized showing majority of the identified proteins belonging to families of predicted surface-associated proteins, and found to be immunogenic when polyclonal antibodies were raised against the corresponding recombinant proteins. Most of them were described as protective antigens in the available literature and one new antigen, capable of conferring consistent protection in the mouse against challenge with a virulent GAS strain was also discovered (Rodríguez-Ortega et al. 2006). When this strategy was applied to the Group B Streptococcus, 43 surface-associated proteins were identified, including all protective antigens. Using a similar approach, Bøhle et al. (2011) revealed 69 surface-located proteins in E. faecalis V583 with varying known or conceivable functions in bacterial cell. The authors also found few proteins with unknown functions. Several identified proteins were involved in cell wall synthesis and maintenance as well as in cell–cell communication and were hypothesised as putative targets for drug design. In Uropathogenic E. coli (UPEC) two approaches was used to elucidate surface proteins; One method was shaving of surface exposed peptides and other was the labelling of whole bacterial cell with a biotin tag followed by two-dimensional liquid chromatography-tandem mass spectrometry (2-DLC-MS/MS). These methods identified 25 virulence proteins, of which one was appeared to be a possible effective vaccine candidate (Walters and Mobley 2010).

In other study, Surface proteome of M. avium subsp. hominissuis was characterized using: selective biotinylation of surface-exposed proteins, streptavidin affinity purification, and shotgun mass spectrometry. Surface proteins unique to the disease were identified in specific culture conditions mimicking early infection. That proteomic report facilitated better understanding towards establishment of M. avium subsp. hominissuis infection (McNamara et al. 2012). More recently, in vitro core surface proteome of Mycoplasma mycoides subsp. Mycoides was characterized that will certainly facilitate prioritization of candidate Mycoplasma antigens for improved control measures, surface-exposed membrane proteins will include those that are involved in host-pathogen interactions (Krasteva et al. 2014). Proteomic analysis of surface immunogens from serogroup B Neisseria meningitidis also carried out for inclusion in subunit vaccines against bacterial meningitis and discovery of vaccine candidates against other bacterial infections (Tsolakos et al. 2014). More recently, to identify potential gonococcal vaccine antigens, cell envelope proteins of N. gonorrhoeae expressed under different environmental stimuli were elucidated using comprehensive proteomic platform (Zielke et al. 2016).

How surface proteins are targeted to the cell surface while crossing the membrane and thick peptidoglycan and remain associated there, are some burning issues for improved understanding towards clinical relevance of these proteins. Certainly, in depth analysis of molecular sorting mechanisms may give idea to identify new therapeutic targets for Gram positive bacteria. Moreover, proteins exposed to surface are prime targets for the host immune system, and thus their identification can aids to the development of novel treatments, and increase the repertoire of vaccines for wide range of disease.

Immunoproteomics for virulence determinants and vaccine discovery

Investigation and evaluation of the host-pathogen response are two essential aspects to vaccine development. Elicitation of cell mediated immune response either driven by Th1 in case of intracellular pathogen or Th2 in extracellular pathogen is strongly required for protection against any pathogen. These all consequences enhance antibody production and antibody-mediated cell killing. Additionally, in vitro knowledge of the interactions between pathogens and host cells is also crucial for investigation of bacterial components that attach to host cells.

Immunoproteomics is an extension of proteomics, which permits specific elucidation of antigens based on immunoreactivity. In process of immunoproteomics, 2-D blots are probed with serum collected from host post infection. This process has bypassed the lengthy process of testing of immunoreactivity and hence vastly boosted the vaccine discovery by directly allowing the identification of those novel proteins which evoke immune system.

The natural step up to, whereby 2-D blots are probed with host serum following infection or immunisation has greatly enhanced the identification of potential vaccine candidates, by enabling the discovery of novel proteins that stimulate the humoral host immune system. The main advantage of this approach is that bacterial proteins have already been processed and modified post-translationally in host system so finally expressed protein is obtained for analysis. Several groups have explored various potential vaccine candidates using this approach which is discussed below.

In S. pneumoniae immunoproteomics has been used to identify immunogenic proteins. 2-D blot of cell wall fractions from S. pneumoniae was probed with sera from healthy children or adults ravelled seventeen immunoreactive proteins. Of these, two proteins were further found to be protective in a mouse challenge model (Mizrachi Nebenzahl et al. 2007).

Furthermore, immunoproteomic analysis of the S. pneumoniae secretome also identified several novel antigens which were shown to be immunogenic (Choi et al. 2012). In another report, the use of immunoproteomics was to visualize and identify immunogenic Shigella flexneri soluble and membrane proteins, which were reactive to sera from S. flexneri infected patients (Jennison et al. 2006).

Another example of the immunoproteomic approach is, in identification of novel vaccine candidates from N. meningitidis. In that study one immunoreactive protein was evaluated as vaccine component after getting protection data from mice (Hsu et al. 2008). In another study, numerous candidate proteins were elucidated from the immunome of Brucella abortus cell envelope for discovery of vaccines against Brucellosis in cattle and humans (Connolly et al. 2006) Furthermore, immunogenic proteins from soluble fraction of Brucella melitensis 16M and M5 were identified using an immunoproteomic assay for developing subunit vaccine against Brucella infection (Yang et al. 2011; Zhao et al. 2011).

In an immunoproteomics study of piscine Streptococcus agalactiae cellular proteins, four novel immunoreactive proteins were implicated as vaccine candidates and virulence factors (Liu et al. 2013). Pang and coworkers (2010) reported first immunoproteomic approach in Vibrio to identify immunogenic proteins of V. harveyi, which provides a valuable tool for developing the protective antigens in future. There is an urgent need to develop polyvalent vaccine candidates that can protect from a broad spectrum of pathogen. For this, heterogeneous antiserum-based immunoproteomics approaches have been employed to explore outer membrane proteins with similar antigenicity that could be used as a cross-protective vaccine against several species of Vibrio (Li et al. 2010). More recently, a diverse set of antigenic proteins, were identified by analysing conidial and hyphal immunomes of the filamentous pathogenic fungus Lomentospora prolificans against healthy human serum which may be further used to discover potential therapeutic targets (Pellon et al. 2016).

Immunoproteomics approach has also been applied to combat viral diseases. To develop a universal flu vaccine, it has been implemented for the direct identification of HLA class I presented epitopes in the last decade, and recognized as a method for the identification of T cell epitopes. In one study, combination of T cell epitopes specifically occur on influenza A-infected cells and a cross-reactive epitope displayed by the ectodomain of influenza M2, was used to evaluate T cell immunity in influenza infection and uncover universal antigen against it (Testa et al. 2012).

Collectively, exposure of bacterial cell proteins to the host immune system provides a pool of the proteins having vaccine candidate potential.

Conclusion

Omics approaches have been used for the lengthy process of identification of subunit vaccine and candidate protein vaccine antigens. As proteins presented on the cell surface or secreted by pathogen are first encountered in host cell. Thus detailed investigation of surface proteome and extracellular proteome of a pathogen may open up a significant opportunity to elucidate proteins associated with disease pathogenesis or having potential as surface markers. A protein antigen capable of neutralizing immune response results into most efficient subunit vaccine. When proteomics is coupled to western blotting, namely immunoproteomics, much of the information about immunogenic proteins can be derived accurately. Detection of immunoreactive antigens using serum highlights immunogenic proteins that are expressed during infection. Thus, the immunoproteomics based approach also appears to be a suitable tool for identification of novel vaccine candidates. The approaches discussed herein can lead to a deeper knowledge of the mechanisms underlying bacterial invasion, colonization, and pathogenesis.