Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data

Abstract

A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 5: Graphical representation of XML document structure.
Figure 1: Graphical representation of the PSI MI format.
Figure 2: PSI MI example file.
Figure 3: 'Interaction detection' controlled vocabulary.
Figure 4: The PIMWalker network visualization tool.

Similar content being viewed by others

References

  1. Miyazaki, S., Sugawara, H., Gojobori, T. & Tateno, Y. DNA Data Bank of Japan (DDBJ). Nucleic Acids Res. 31, 13–16 (2003).

    Article  CAS  Google Scholar 

  2. Stoesser, G. et al. The EMBL Nucleotide Sequence Database: major new developments. Nucleic Acids. Res. 31, 17–22 (2003).

    Article  CAS  Google Scholar 

  3. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. & Wheeler, D.L. GenBank. Nucleic Acids Res. 31, 23–27 (2003).

    Article  CAS  Google Scholar 

  4. Westbrook, J., Feng, Z., Chen, L., Yang, H. & Berman, H.M. The Protein Data Bank and structural genomics. Nucleic Acids Res. 31, 489–491 (2003).

    Article  CAS  Google Scholar 

  5. Spellman, P.T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, research0046.1–0046.9 (2003).

    Google Scholar 

  6. Brazma, A. et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).

    Article  CAS  Google Scholar 

  7. Ball, C.A. Microarray Gene Expression Data (MGED) Society: standards for microarray data. Science 298, 539 (2002).

    Article  CAS  Google Scholar 

  8. Orchard, O., Hermjakob, H. & Apweiler, R. The Proteomics Standards Initiative. Proteomics 7, 1374–1376 (2003).

    Article  Google Scholar 

  9. Taylor, C.F. et al. A systematic approach to modeling, capturing and disseminating proteomics experimental data. Nat. Biotechnol. 21, 247–254 (2003).

    Article  CAS  Google Scholar 

  10. Bader, G.D., Betel, D. & Hogue, C.W.V. BIND, the Biomolecular Interaction Network Database. Nucleic Acids Res. 31, 248–250 (2003).

    Article  CAS  Google Scholar 

  11. Salwinski, L. et al. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004).

    Article  CAS  Google Scholar 

  12. Mewes, H.W. et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31–34 (2002).

    Article  CAS  Google Scholar 

  13. Zanzoni, A. et al. MINT: a Molecular INTeraction database. FEBS Lett. 513, 135–140 (2002).

    Article  CAS  Google Scholar 

  14. von Mering, C. et al. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261 (2003).

    Article  CAS  Google Scholar 

  15. Bader, G.D. & Hogue, C.W. BIND—a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics 16, 465–477 (2000).

    Article  CAS  Google Scholar 

  16. Kaiser, J. Proteomics. Public-private group maps out initiatives. Science 296, 827 (2002).

    Article  CAS  Google Scholar 

  17. Orchard, S., Kersey, P., Hermjakob, H. & Apweiler, R. The HUPO Proteomics Standards Initiative meeting: towards common standards for exchanging proteomics data. Comp. Funct. Genomics 4, 16–19 (2003).

    Article  CAS  Google Scholar 

  18. Orchard, S. et al. Progress in establishing common standards for exchanging proteomics data: the second meeting of the HUPO Proteomics Standards Initiative. Comp. Funct. Genomics 4, 203–206 (2003).

    Article  CAS  Google Scholar 

  19. Hucka, M. et al. The Systems Biology Markup Language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531 (2003).

    Article  CAS  Google Scholar 

  20. The Gene Ontology Consortium. Creating the gene ontology resource: design and implementation. Genome Res. 11, 1425–1433 (2001).

  21. Boeckmann, B. et al. The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).

    Article  CAS  Google Scholar 

  22. Deane, C.M., Salwinski, L., Xenarios, I. & Eisenberg, D. Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell Proteomics 1, 349–356 (2002).

    Article  CAS  Google Scholar 

  23. Rain, J.-R. et al. The protein-protein interaction map of Helicobacter pylori. Nature 409, 211–215 (2001).

    Article  CAS  Google Scholar 

  24. Garavelli, J.S. The RESID Database of Protein Modifications: 2003 developments. Nucleic Acids Res. 31, 499–501 (2003).

    Article  CAS  Google Scholar 

  25. Day, R.N., Periasamy, A. & Schaufele, F. Fluorescence resonance energy transfer microscopy of localized protein interactions in the living cell nucleus. Methods 25, 4–18 (2001).

    Article  CAS  Google Scholar 

  26. Reboul, J. et al. C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expression. Nat. Genet. 34, 35–41 (2003).

    Article  Google Scholar 

  27. Peri, S. et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 13, 2363–2371 (2003).

    Article  CAS  Google Scholar 

  28. Hermjakob, H. et al. IntAct—an open source molecular interaction database. Nucleic Acids Res., 32, D452–D455 (2004).

    Article  CAS  Google Scholar 

  29. Husi, H. & Grant, S.G. Construction of a Protein-Protein Interaction Database (PPID) for Synaptic Biology. in Neuroscience Databases: A Practical Guide. (R. Kotter, ed.) 1–62 (Boston/Dordrecht/London, Kluwer Academic Publishers, 2002).

    Google Scholar 

Download references

Acknowledgements

This work was supported partially by EU grant number QLRI-CT-2001-00015 under the Research and Technological Development program 'Quality of Life and Management of Living Resources'. The PSI meetings were supported by the Human Proteome Organization. The work in the University of Rome 'Tor Vergata' was supported by grants from Associazione Italiana per la Ricerca sul Cancro and grant GTF02011 from Telethon. M.L. is supported by the European Molecular Biology Laboratory International PhD program and Biotechnology and Biological Sciences Research Council grant 8/C19399. Y.L. and R.Z. are supported by grants 2001AA233031, 2002CB512801, 110CB510209. M.V.'s laboratory is supported by grants from the US National Cancer Institute and National Human Genome Research Institute. L.M.-P. would like to thank Jens Pedersen, Claudia Bagni, Benedetta Mattei, Elena Santonico, Federico Demasi and Michael Ashburner for contributions to the controlled vocabularies. Emmanuel Cézanne, Sébastien Cros, Claire Even, Nicolas Jolibert, Sandrine Marquès, Christophe Roumegous, Patrick Sablayrolles and René Thomas-Nelson contributed to the development of the PSI XSLT utilities. The collaborative development process has been facilitated by the infrastructure provided by Source Forge.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henning Hermjakob.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Institute of Bioinformatics, International Tech Park, Whitefield Road, 560 066 Bangalore, India.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hermjakob, H., Montecchi-Palazzi, L., Bader, G. et al. The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data. Nat Biotechnol 22, 177–183 (2004). https://doi.org/10.1038/nbt926

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt926

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing