Skip to main content

Algorithms and Databases

  • Protocol
  • First Online:
Book cover Proteomics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 564))

Summary

The capacity of proteomics methods and mass spectrometry instrumentation to generate data has grown substantially over the past years. This data volume growth has in turn led to an increased reliance on software to identify peptide or protein sequences from the recorded mass spectra. Diverse algorithms can be applied for the processing of these data, each performing a specific task such as spectrum quality filtering, spectral clustering and merging, assigning a sequence to a spectrum, and assessing the validity of these assignments.

The key algorithms to mass spectral processing pipelines are the ones that assign a sequence to a spectrum. The most commonly used variants of these are crucially dependent on the information contained in the sequences database, which they use as a basis for identification. Since these sequence databases are constructed in different ways and can therefore vary substantially in the amount and type of data they contain, they are also discussed here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sadygov, R. G., Cociorva, D. and Yates, J. R. (2004) Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat Methods 1, 195–202.

    Article  PubMed  CAS  Google Scholar 

  2. Nesvizhskii, A. I., Vitek, O. and Aebersold, R. (2007) Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat Methods 4, 787–797.

    Article  PubMed  CAS  Google Scholar 

  3. Matthiesen, R. (2007) Methods, algorithms and tools in computational proteomics: a practical point of view. Proteomics 7, 2815–2832.

    Article  PubMed  CAS  Google Scholar 

  4. Perkins, D. N., Pappin, D. J., Creasy, D. M. and Cottrell, J. S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567.

    Article  PubMed  CAS  Google Scholar 

  5. Cottrell, J. S. (1994) Protein identification by peptide mass fingerprinting. Pept Res 7, 115–124.

    PubMed  CAS  Google Scholar 

  6. Zhang, W. and Chait, B. T. (2000) ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. Anal Chem 72, 2482–2489.

    Article  PubMed  CAS  Google Scholar 

  7. Eng, J. K., McCormack, A. L. and Yates, J. R. (1994) An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database. J Am Soc Mass Spectrom 5, 976–989.

    Article  CAS  Google Scholar 

  8. Craig, R. and Beavis, R. C. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467.

    Article  PubMed  CAS  Google Scholar 

  9. Keller, A., Nesvizhskii, A. I., Kolker, E. and Aebersold, R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74, 5383–5392.

    Article  PubMed  CAS  Google Scholar 

  10. Zhang, Z. (2004) De novo peptide sequencing based on a divide-and-conquer algorithm and peptide tandem spectrum simulation. Anal Chem 76, 6374–6383.

    Article  PubMed  CAS  Google Scholar 

  11. Taylor, J. and Johnson, R. (2001) Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal Chem 73, 2594–2604.

    Article  PubMed  CAS  Google Scholar 

  12. Ma, B., Zhang, K., Hendrie, C., Liang, C., Li, M., Doherty-Kirby, A. et al (2003) PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom 17, 2337–2342.

    Article  PubMed  CAS  Google Scholar 

  13. Grossmann, J., Roos, F., Cieliebak, M., Liptak, Z., Mathis, L., Muller, M. et al (2005) AUDENS: a tool for automated peptide de novo sequencing. J Proteome Res 4, 1768–1774.

    Article  PubMed  CAS  Google Scholar 

  14. Frank, A. and Pevzner, P. (2005) PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal Chem 77, 964–973.

    Article  PubMed  CAS  Google Scholar 

  15. Fernandez-de-Cossio, J., Gonzalez, J., Satomi, Y., Shima, T., Okumura, N., Besada, V. et al (2000) Automated interpretation of low-energy collision-induced dissociation spectra by SeqMS, a software aid for de novo sequencing by tandem mass spectrometry. Electrophoresis 21, 1694–1699.

    Article  PubMed  CAS  Google Scholar 

  16. Dancik, V., Addona, T., Clauser, K., Vath, J. and Pevzner, P. (1999) De novo peptide sequencing via tandem mass spectrometry. J Comput Biol 6, 327–342.

    Article  PubMed  CAS  Google Scholar 

  17. Pitzer, E., Masselot, A. and Colinge, J. (2007) Assessing peptide de novo sequencing algorithms performance on large and diverse data sets. Proteomics 7, 3051–3054.

    Article  PubMed  CAS  Google Scholar 

  18. Pevtsov, S., Fedulova, I., Mirzaei, H., Buck, C. and Zhang, X. (2006) Performance evaluation of existing de novo sequencing algorithms. J Proteome Res 5, 3018–3028.

    Article  PubMed  CAS  Google Scholar 

  19. Mann, M. and Wilm, M. (1994) Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal Chem 66, 4390–4399.

    Article  PubMed  CAS  Google Scholar 

  20. Mørtz, E., O’Connor, P. B., Roepstorff, P., Kelleher, N. L., Wood, T. D. et al (1996) Sequence tag identification of intact proteins by matching tanden mass spectral data against sequence data bases. Proc Natl Acad Sci U S A 93, 8264–8267.

    Article  Google Scholar 

  21. Tabb, D. L., Saraf, A. and Yates, J. R. (2003) GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. Anal Chem 75, 6415–6421.

    Article  PubMed  CAS  Google Scholar 

  22. Martens, L., Hermjakob, H., Jones, P., Adamski, M., Taylor, C., States, D. et al (2005) PRIDE: the proteomics identifications database. Proteomics 5, 3537–3545.

    Article  PubMed  CAS  Google Scholar 

  23. Jones, P., Cote, R. G., Martens, L., Quinn, A. F., Taylor, C. F., Derache, W. et al (2006) PRIDE: a public repository of protein and peptide identifications for the proteomics community. Nucleic Acids Res 34, D659–D663.

    Article  PubMed  CAS  Google Scholar 

  24. Desiere, F., Deutsch, E. W., Nesvizhskii, A. I., Mallick, P., King, N. L., Eng, J. K. et al (2005) Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol 6, R9.

    Article  PubMed  Google Scholar 

  25. Craig, R., Cortens, J. P. and Beavis, R. C. (2004) Open source system for analyzing, validating, and storing protein identification data. J Proteome Res 3, 1234–1242.

    Article  PubMed  CAS  Google Scholar 

  26. Lam, H., Deutsch, E. W., Eddes, J. S., Eng, J. K., King, N., Stein, S. E. et al (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655–667.

    Article  PubMed  CAS  Google Scholar 

  27. Martens, L., Nesvizhskii, A. I., Hermjakob, H., Adamski, M., Omenn, G. S., Vandekerckhove, J. et al (2005) Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories. Proteomics 5, 3501–3505.

    Article  PubMed  CAS  Google Scholar 

  28. Gentzel, M., Köcher, T., Ponnusamy, S. and Wilm, M. (2003) Preprocessing of tandem mass spectrometric data to support automatic protein identification. Proteomics 3, 1597–1610.

    Article  PubMed  CAS  Google Scholar 

  29. Zhang, X., Asara, J. M., Adamec, J., Ouzzani, M. and Elmagarmid, A. K. (2005) Data pre-processing in liquid chromatography-mass spectrometry-based proteomics. Bioinformatics 21, 4054–4059.

    Article  PubMed  CAS  Google Scholar 

  30. Gevaert, K., Goethals, M., Martens, L., Van Damme, J., Staes, A., Thomas, G. R. et al (2003) Exploring proteomes and analyzing protein processing by mass spectrometric identification of sorted N-terminal peptides. Nat Biotechnol 21, 566–569.

    Article  PubMed  CAS  Google Scholar 

  31. Yi, J., Kim, C. and Gelfand, C. A. (2007) Inhibition of intrinsic proteolytic activities moderates preanalytical variability and instability of human plasma. J Proteome Res 6, 1768–1781.

    Article  PubMed  CAS  Google Scholar 

  32. Creasy, D. M. and Cottrell, J. S. (2002) Error tolerant searching of uninterpreted tandem mass spectrometry data. Proteomics 2, 1426–1434.

    Article  PubMed  CAS  Google Scholar 

  33. Falkner, J. and Andrews, P. (2005) Fast tandem mass spectra-based protein identification regardless of the number of spectra or potential modifications examined. Bioinformatics 21, 2177–2184.

    Article  PubMed  CAS  Google Scholar 

  34. Salmi, J., Moulder, R., Filén, J., Nevalainen, O. S., Nyman, T. A., Lahesmaa, R. et al (2006) Quality classification of tandem mass spectrometry data. Bioinformatics 22, 400–406.

    Article  PubMed  CAS  Google Scholar 

  35. Bern, M., Goldberg, D., McDonald, W. H. and Yates, J.R.3rd (2004) Automatic quality assessment of peptide tandem mass spectra. Bioinformatics 20 Suppl 1, i49–i54.

    Article  PubMed  CAS  Google Scholar 

  36. Hoopmann, M. R., Finney, G. L. and MacCoss, M. J. (2007) High-speed data reduction, feature detection, and MS/MS spectrum quality assessment of shotgun proteomics data sets using high-resolution mass spectrometry. Anal Chem 79, 5620–5632.

    Article  PubMed  CAS  Google Scholar 

  37. Wong, J. W. H., Sullivan, M. J., Cartwright, H. M. and Cagney, G. (2007) msmsEval: tandem mass spectral quality assignment for high-throughput proteomics. BMC Bioinformatics 8, 51.

    Article  PubMed  Google Scholar 

  38. Nesvizhskii, A. I., Roos, F. F., Grossmann, J., Vogelzang, M., Eddes, J. S., Gruissem, W. et al (2006) Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol Cell Proteomics 5, 652–670.

    PubMed  CAS  Google Scholar 

  39. Flikka, K., Martens, L., Vandekerckhove, J., Gevaert, K. and Eidhammer, I. (2006) Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering. Proteomics 6, 2086–2094.

    Article  PubMed  CAS  Google Scholar 

  40. Xu, M., Geer, L. Y., Bryant, S. H., Roth, J. S., Kowalak, J. A., Maynard, D. M. et al (2005) Assessing data quality of peptide mass spectra obtained by quadrupole ion trap mass spectrometry. J Proteome Res 4, 300–305.

    Article  PubMed  CAS  Google Scholar 

  41. Purvine, S., Kolker, N. and Kolker, E. (2004) Spectral quality assessment for high-throughput tandem mass spectrometry proteomics. OMICS 8, 255–265.

    Article  PubMed  CAS  Google Scholar 

  42. Liu, H., Sadygov, R. G. and Yates, J.R.3rd. (2004) A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem 76, 4193–4201.

    Article  PubMed  CAS  Google Scholar 

  43. Ishihama, Y., Oda, Y., Tabata, T., Sato, T., Nagasu, T., Rappsilber, J. et al (2005) Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics 4, 1265–1272.

    Article  PubMed  CAS  Google Scholar 

  44. Tabb, D. L., MacCoss, M. J., Wu, C. C., Anderson, S. D. and Yates, J. R. (2003) Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. Anal Chem 75, 2470–2477.

    Article  PubMed  CAS  Google Scholar 

  45. Tabb, D. L., Thompson, M. R., Khalsa-Moyers, G., VerBerkmoes, N. C. and McDonald, W. H. (2005) MS2Grouper: group assessment and synthetic replacement of duplicate proteomic tandem mass spectra. J Am Soc Mass Spectrom 16, 1250–1261.

    Article  PubMed  CAS  Google Scholar 

  46. Flikka, K., Meukens, J., Helsens, K., Vandekerckhove, J., Eidhammer, I., Gevaert, K. et al (2007) Implementation and application of a versatile clustering tool for tandem mass spectrometry data. Proteomics 7, 3245–3258.

    Article  PubMed  CAS  Google Scholar 

  47. Kersey, P. J., Duarte, J., Williams, A., Karavidopoulou, Y., Birney, E. and Apweiler, R. (2004) The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985–1988.

    Article  PubMed  CAS  Google Scholar 

  48. Prince, J. T., Carlson, M. W., Wang, R., Lu, P. and Marcotte, E. M. (2004) The need for a public proteomics repository. Nat Biotechnol 22, 471–472.

    Article  PubMed  CAS  Google Scholar 

  49. Mead, J. A., Shadforth, I. P. and Bessant, C. (2007) Public proteomic MS repositories and pipelines: available tools and biological applications. Proteomics 7, 2769–2786.

    Article  PubMed  CAS  Google Scholar 

  50. Hermjakob, H. and Apweiler, R. (2006) The Proteomics Identifications Database (PRIDE) and the ProteomExchange Consortium: making proteomics data accessible. Expert Rev Proteomics 3, 1–3.

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

Lennart Martens thanks Prof. Dr. Joël Vandekerckhove and Prof. Dr. Kris Gevaert for sharing their extensive knowledge on proteomics, and Henning Hermjakob for support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rolf Apweiler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Martens, L., Apweiler, R. (2009). Algorithms and Databases. In: Reinders, J., Sickmann, A. (eds) Proteomics. Methods in Molecular Biology™, vol 564. Humana Press. https://doi.org/10.1007/978-1-60761-157-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-157-8_14

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60761-156-1

  • Online ISBN: 978-1-60761-157-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics