Skip Navigation

Nucleic Acids Research 2005 33(20):6486-6493; doi:10.1093/nar/gki949
This Article
Right arrow Full Text Freely available
Right arrow Print PDF (258K) Freely available
Right arrow Screen PDF (265K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (19)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Bhardwaj, N.
Right arrow Articles by Lu, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bhardwaj, N.
Right arrow Articles by Lu, H.
Related Collections
Right arrow Computational methods
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published online 10 November 2005

© The Author 2005. Published by Oxford University Press. All rights reserved
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions{at}oxfordjournals.org


Article

Kernel-based machine learning protocol for predicting DNA-binding proteins

Nitin Bhardwaj, Robert E. Langlois, Guijun Zhao and Hui Lu*

Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago Chicago, IL 60607, USA

*To whom correspondence should be addressed at Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street (M/C 063), Room 218, Chicago, IL 60607, USA. Tel: +1 312 413 2021; Fax: +1 312 413 2018; Email: huilu{at}uic.edu

Received June 20, 2005. Revised August 2, 2005. Accepted October 18, 2005.

DNA-binding proteins (DNA-BPs) play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Attempts have been made to identify DNA-BPs based on their sequence and structural information with moderate accuracy. Here we develop a machine learning protocol for the prediction of DNA-BPs where the classifier is Support Vector Machines (SVMs). Information used for classification is derived from characteristics that include surface and overall composition, overall charge and positive potential patches on the protein surface. In total 121 DNA-BPs and 238 non-binding proteins are used to build and evaluate the protocol. In self-consistency, accuracy value of 100% has been achieved. For cross-validation (CV) optimization over entire dataset, we report an accuracy of 90%. Using leave 1-pair holdout evaluation, the accuracy of 86.3% has been achieved. When we restrict the dataset to less than 20% sequence identity amongst the proteins, the holdout accuracy is achieved at 85.8%. Furthermore, seven DNA-BPs with unbounded structures are all correctly predicted. The current performances are better than results published previously. The higher accuracy value achieved here originates from two factors: the ability of the SVM to handle features that demonstrate a wide range of discriminatory power and, a different definition of the positive patch. Since our protocol does not lean on sequence or structural homology, it can be used to identify or predict proteins with DNA-binding function(s) regardless of their homology to the known ones.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
M. Gao and J. Skolnick
DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions
Nucleic Acids Res., July 1, 2008; 36(12): 3978 - 3992.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Shazman, G. Celniker, O. Haber, F. Glaser, and Y. Mandel-Gutfreund
Patch Finder Plus (PFplus): A web server for extracting and displaying positive electrostatic patches on protein surfaces
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W526 - W530.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
K. Fujishima, M. Komasa, S. Kitamura, H. Suzuki, M. Tomita, and A. Kanai
Proteome-Wide Prediction of Novel DNA/RNA-Binding Proteins Using Amino Acid Composition and Periodicity in the Hyperthermophilic Archaeon Pyrococcus furiosus
DNA Res, June 15, 2007; (2007) dsm011v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z.-Q. Ye, S.-Q. Zhao, G. Gao, X.-Q. Liu, R. E. Langlois, H. Lu, and L. Wei
Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP)
Bioinformatics, June 15, 2007; 23(12): 1444 - 1450.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.