Analysis of Informative Features for Negative Selection in Protein Function Prediction

Frasca, Marco; Lipreri, Fabio; Malchiodi, Dario

doi:10.1007/978-3-319-56154-7_25

Marco Frasca¹⁵,
Fabio Lipreri¹⁵ &
Dario Malchiodi¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10209))

Included in the following conference series:

International Conference on Bioinformatics and Biomedical Engineering

1809 Accesses
2 Citations

Abstract

Negative examples in automated protein function prediction (AFP), that is proteins known not to possess a given protein function, are usually not directly stored in public proteome and genome databases, such as the Gene Ontology database. Nevertheless, most computational methods need negative examples to infer new predictions. A variety of algorithms has been proposed in AFP for negative selection, ranging from network- and feature-based heuristics, to hierarchy-based and hierarchy-less strategies. Moreover, several bio-molecular information sources about proteins, such as gene co-expression, genetic and protein-protein interactions data, are naturally encoded in protein networks, where nodes are proteins and edges connect proteins sharing common characteristics. Although selecting negatives in biological networks is thereby a central and challenging problem in computational biology, detecting the characteristics proteins should have to be considered as negative is still a difficult task. It this work, we show that a few protein features extracted from the network help in detecting reliable negatives. We tested such features in two real world experiments: predicting unreliable negatives with an SVM classifier through temporal holdout on model organisms for AFP, and selecting reliable negatives with a clustering-based state-of-the-art negative selection procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.nlm.nih.gov/mesh.

References

Robinson, P.N., et al.: The human phenotype ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 83(5), 610–615 (2008)
Article Google Scholar
Ruepp, A., et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res. 32(18), 5539–5545 (2004)
Article Google Scholar
Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Nature Genet. 25(1), 25–29 (2000)
Article Google Scholar
Radivojac, P., et al.: A large-scale evaluation of computational protein function prediction. Nat. Methods 10(3), 221–227 (2013)
Article Google Scholar
Jiang, Y., Oron, T.R., et al.: An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 17(1), 184 (2016)
Article Google Scholar
Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2014)
Article Google Scholar
Burghouts, G.J., Schutte, K., Bouma, H., den Hollander, R.J.M.: Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos. Mach. Vis. Appl. 25(1), 85–98 (2014)
Article Google Scholar
Frasca, M., Malchiodi, D.: Selection of negative examples for node label prediction through fuzzy clustering techniques. In: Bassis, S., Esposito, A., Morabito, F.C., Pasero, E. (eds.) Advances in Neural Networks. SIST, vol. 54, pp. 67–76. Springer, Cham (2016). doi:10.1007/978-3-319-33747-0_7
Chapter Google Scholar
Gomez, S.M., Noble, W.S., Rzhetsky, A.: Learning to predict protein-protein interactions from protein sequences. Bioinformatics 19(15), 1875–1881 (2003)
Article Google Scholar
Mostafavi, S., Morris, Q.: Using the gene ontology hierarchy when predicting gene function. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 419–427 (2009)
Google Scholar
Youngs, N., Penfold-Brown, D., Drew, K., Shasha, D., Bonneau, R.: Parametric bayesian priors and better choice of negative examples improve protein function prediction. Bioinformatics 29(9), tt10-98 (2013)
Google Scholar
Youngs, N., Penfold-Brown, D., Bonneau, R., Shasha, D.: Negative example selection for protein function prediction: the NoGO database. PLOS Comput. Biol. 10(6), 1–12 (2014)
Google Scholar
Frasca, M., Bassis, S.: Gene-disease prioritization through cost-sensitive graph-based methodologies. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2016. LNCS, vol. 9656, pp. 739–751. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31744-1_64
Chapter Google Scholar
Ashburn, T.T., Thor, K.B.: Drug repositioning: identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3(8), 673–683 (2004)
Article Google Scholar
Gillis, J., Pavlidis, P.: The impact of multifunctional genes on “Guilt by Association” analysis. PLoS ONE 6(2), e17258 (2011)
Google Scholar
Frasca, M.: Automated gene function prediction through gene multifunctionality in biological networks. Neurocomputing 162, 48–56 (2015)
Article Google Scholar
Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: generalizing degree and shortest paths. Soc. Netw. 32(3), 245–251 (2010)
Article Google Scholar
Frasca, M., Bertoni, A., et al.: UNIPred: unbalance-aware Network Integration and Prediction of protein functions. J. Comput. Biol. 22(12), 1057–1074 (2015)
Article Google Scholar
Szklarczyk, D., et al.: String v10: proteinprotein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(D1), D447–D452 (2015)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Google Scholar
Mostafavi, S., Goldenberg, A., Morris, Q.: Labeling nodes using three degrees of propagation. PLoS ONE 7(12), e51947 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università degli Studi di Milano, Via Comelico 39/41, 20135, Milano, Italy
Marco Frasca, Fabio Lipreri & Dario Malchiodi

Authors

Marco Frasca
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Lipreri
View author publications
You can also search for this author in PubMed Google Scholar
Dario Malchiodi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dario Malchiodi .

Editor information

Editors and Affiliations

Universidad de Granada, Granada, Spain
Ignacio Rojas
Universidad de Granada, Granada, Spain
Francisco Ortuño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Frasca, M., Lipreri, F., Malchiodi, D. (2017). Analysis of Informative Features for Negative Selection in Protein Function Prediction. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-56154-7_25
Published: 01 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56153-0
Online ISBN: 978-3-319-56154-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics