Abstract
DNA-binding proteins (DBPs) participate in various biological processes including DNA replication, recombination, and repair. In the human genome, about 6–7% of these proteins are utilized for genes encoding. DBPs shape the DNA into a compact structure known chromatin while some of these proteins regulate the chromosome packaging and transcription process. In the pharmaceutical industry, DBPs are used as a key component of antibiotics, steroids, and cancer drugs. These proteins also involve in biophysical, biological, and biochemical studies of DNA. Due to the crucial role in various biological activities, identification of DBPs is a hot issue in protein science. A series of experimental and computational methods have been proposed, however, some methods didn’t achieve the desired results while some are inadequate in its accuracy and authenticity. Still, it is highly desired to present more intelligent computational predictors. In this work, we introduce an innovative computational method namely DP-BINDER based on physicochemical and evolutionary information. We captured local highly decisive features from physicochemical properties of primary protein sequences via normalized Moreau-Broto autocorrelation (NMBAC) and evolutionary information by position specific scoring matrix-transition probability composition (PSSM-TPC) and pseudo-position specific scoring matrix (PsePSSM) using training and independent datasets. The optimal features were selected by the support vector machine-recursive feature elimination and correlation bias reduction (SVM-RFE + CBR) from fused features and were fed into random forest (RF) and support vector machine (SVM). Our method attained 92.46% and 89.58% accuracy with jackknife and ten-fold cross-validation, respectively on the training dataset, while 81.17% accuracy on the independent dataset for prediction of DBPs. These results demonstrate that our method attained the highest success rate in the literature. The superiority of DP-BINDER over existing approaches due to several reasons including abstraction of local dominant features via effective feature descriptors, utilization of appropriate feature selection algorithms and effective classifier.








Similar content being viewed by others
References
Ali F, Kabir M, Arif M, Swati ZNK, Khan ZU, Ullah M, Yu D-J (2018) Chemom Intell Lab Syst 182:21
Ji G, Lin Y, Lin Q, Huang G, Zhu W, You W (2016) Predicting DNA-binding proteins using feature fusion and MSVM-RFE. In: 10th IEEE international conference on anti-counterfeiting, security, and identification (ASID) 2016, p 109
Latchman DS (1997) Int J Biochem Cell Biol 29(12):1305
Semenza GL (1998) Transcription factors and human disease. Oxford Monographs on Medical Genetics. Oxford University Press, Oxford
Al-Lazikani B, Hopkins A (2006) Nat Rev Drug Discov 5:993
Gronemeyer H, Gustafsson J-Å, Laudet V (2004) Nat Rev Drug Discov 3(11):950
Zou Y, Liu Y, Wu X, Shell SM (2006) J Cell Physiol 208(2):267
Vinkemeier U, Moarefi I, Darnell JE, Kuriyan J (1998) Science 279(5353):1048
Hoskisson PA, Rigali S (2009) Adv Appl Microbiol 69:1
Yu S, Luo J, Song Z, Ding F, Dai Y, Li N (2011) Cell Res 21(11):1638
Hauschild J, Petersen B, Santiago Y, Queisser A-L, Carnwath JW, Lucas-Hahn A, Zhang L, Meng X, Gregory PD, Schwinzer R (2011) Proc Natl Acad Sci USA 108(29):12013
Geurts AM, Cost GJ, Freyvert Y, Zeitler B, Miller JC, Choi VM, Jenkins SS, Wood A, Cui X, Meng X (2009) Science 325(5939):433
Curtin SJ, Zhang F, Sander JD, Haun WJ, Starker C, Baltes NJ, Reyon D, Dahlborg EJ, Goodwin MJ, Coffman AP (2011) Plant Physiol 156(2):466
Cai CQ, Doyon Y, Ainley WM, Miller JC, DeKelver RC, Moehle EA, Rock JM, Lee Y-L, Garrison R, Schulenberg L (2009) Plant Mol Biol 69(6):699
Shukla VK, Doyon Y, Miller JC, DeKelver RC, Moehle EA, Worden SE, Mitchell JC, Arnold NL, Gopalan S, Meng X (2009) Nature 459(7245):437
Tebas P, Stein D, Tang WW, Frank I, Wang SQ, Lee G, Spratt SK, Surosky RT, Giedlin MA, Nichol G (2014) N Engl J Med 370(10):901
Murugesapillai D, McCauley MJ, Huo R, Nelson Holte MH, Stepanyants A, Maher LJ III, Israeloff NE, Williams MC (2014) Nucleic Acids Res 42(14):8996
Grosschedl R, Giese K, Pagel J (1994) Trends Genet 10(3):94
Khrapko KR, Khorlin AA, Ivanov IB, Ershov GM, Lysov JP, Florentiev VL, Mirzabekov AD (1996) Methods of DNA sequencing by hybridization based on optimizing concentration of matrix-bound oligonucleotide and device for carrying out same. Google Patents
Freeman K, Gwadz M, Shore D (1995) Genetics 141(4):1253
Jaiswal R, Singh SK, Bastia D, Escalante CR (2015) Acta Crystallogr Sect F: Struct Biol Commun 71(4):414
Omichinski JG, Clore GM, Schaad O, Felsenfeld G, Trainor C, Appella E, Stahl SJ, Gronenborn AM (1993) Science 261(5120):438
Consortium U (2016) Nucleic Acids Res 45(D1):D158
Lin W-Z, Fang J-A, Xiao X, Chou K-C (2011) PLoS ONE 6(9):e24756
Xu R, Zhou J, Liu B, He Y, Zou Q, Wang X, Chou K-C (2015) J Biomol Struct Dyn 33(8):1720
Shanahan HP, Garcia MA, Jones S, Thornton JM (2004) Nucleic Acids Res 32(16):4732
Gao M, Skolnick J (2009) PLoS Comput Biol 5(11):e1000567
Nimrod G, Schushan M, Szilágyi A, Leslie C, Ben-Tal N (2010) Bioinformatics 26(5):692
Ahmad S, Sarai A (2004) J Mol Biol 341(1):65
Bhardwaj N, Langlois RE, Zhao G, Lu H (2005) Nucleic Acids Res 33(20):6486
Cai Y, He J, Li X, Lu L, Yang X, Feng K, Lu W, Kong X (2008) J Proteome Res 8(2):999
Pröpper K, Meindl K, Sammito M, Dittrich B, Sheldrick GM, Pohl E, Usón I (2014) Acta Crystallogr D Biol Crystallogr 70(6):1743
Zhao H, Wang J, Zhou Y, Yang Y (2014) PLoS ONE 9(5):e96694
Zhang J, Gao B, Chai H, Ma Z, Yang G (2016) BMC Bioinform 17(1):323
Chou K-C (2015) Med Chem 11(3):218
Kumar KK, Pugalenthi G, Suganthan P (2009) J Biomol Struct Dyn 26(6):679
Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X, Chou K-C (2014) PLoS ONE 9(9):e106691
Lou W, Wang X, Chen F, Chen Y, Jiang B, Zhang H (2014) PLoS ONE 9(1):e86703
Liu B, Wang S, Wang X (2015) Scientific reports 5:15479
Dong Q, Wang S, Wang K, Liu X, Liu B (2015) Identification of DNA-binding proteins by auto-cross covariance transformation. In: IEEE international conference on bioinformatics and biomedicine (BIBM), 2015, p 470
Liu B, Xu J, Fan S, Xu R, Zhou J, Wang X (2015) Mol Inform 34(1):8
Wei L, Tang J, Zou Q (2017) Inf Sci 384:135
Im J, Tuvshinjargal N, Park B, Lee W, Huang D-S, Han K (2015) PNImodeler: web server for inferring protein-binding nucleotides from sequence data. BioMed Central, BMC Genom, p S6
Xu R, Zhou J, Wang H, He Y, Wang X, Liu B (2015) Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BioMed Central, BMC Syst Biol, p S10
Paz I, Kligun E, Bengad B, Mandel-Gutfreund Y (2016) Nucleic Acids Res 44(W1):W568
Zhang J, Liu B (2017) Int J Mol Sci 18(9):1856
Zaman R, Chowdhury SY, Rashid MA, Sharma A, Dehzangi A, Shatabda S (2017) Biomed Res Int. https://doi.org/10.1155/2017/4590609
Chowdhury SY, Shatabda S, Dehzangi A (2017) Sci Rep 7(1):14938
Liu X-J, Gong X-J, Yu H, Xu J-H (2018) Genes 9(8):394
Rohs R, Jin X, West SM, Joshi R, Honig B, Mann RS (2010) Annu Rev Biochem 79:233
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2006) The protein data bank, 1999. In: Rossmann MG, Arnold E (eds) International tables for crystallography Volume F: crystallography of biological macromolecules. Springer, Dordrecht, p 675
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Nucleic Acids Res 25(17):3389
Yousef A, Charkari NM (2015) J Biomed Inform 56:300
Li Z-R, Lin HH, Han L, Jiang L, Chen X, Chen YZ (2006) Nucleic Acids Res 34(suppl_2):W32
Guo Y, Yu L, Wen Z, Li M (2008) Nucleic Acids Res 36(9):3025
Kressel U (1998) Advances in kernel methods: support vector learning. MIT Press, Cambridge, p 255
Vapnik V (1998) Statistical learning theory. Wiley, New York
Wan S, Mak M-W, Kung S-Y (2017) Chemom Intell Lab Syst 162:1
Zhang S (2015) Chemom Intell Lab Syst 142:28
Luo J, Yu L, Guo Y, Li M (2012) Chemom Intell Lab Syst 110(1):163
Sharma R, Dehzangi A, Lyons J, Paliwal K, Tsunoda T, Sharma A (2015) IEEE Trans Nanobiosci 14(8):915
Cui X, Yu Z, Yu B, Wang M, Tian B, Ma Q (2019) Chemom Intell Lab Syst 184:28
Zhang S, Ye F, Yuan X (2012) J Biomol Struct Dyn 29(6):1138
Mundra PA, Rajapakse JC (2007) SVM-RFE with relevancy and redundancy criteria for gene selection. In: IAPR international workshop on pattern recognition in bioinformatics, Springer, 2007, p 242
Duan K-B, Rajapakse JC, Wang H, Azuaje F (2005) IEEE Trans Nanobiosci 4(3):228
Ali F, Hayat M (2015) J Theor Biol 384:78
Ali F, Hayat M (2016) J Theor Biol 403:30
Ahmed S, Kabir M, Ali Z, Arif M, Ali F, Yu D-J (2018) Comb Chem High Throughput Screening 21(9):631
Ahmed S, Kabir M, Arif M, Ali Z, Ali F, Swati ZNK (2018) Int J Data Min Bioinform 21(3):212
Gong R, Wu C, Chu M (2018) Chemom Intell Lab Syst 172:109
Sun B-Y, Zhu Z-H, Li J, Linghu B (2011) IEEE/ACM Trans Comput Biol Bioinf 8(6):1671
Chu M, Gong R, Gao S, Zhao J (2017) Chemom Intell Lab Syst 171:140
Granitto PM, Furlanello C, Biasioli F, Gasperi F (2006) Chemom Intell Lab Syst 83(2):83
Duda RO, Hart PE, Stork DG (2002) Pattern classification. Wiley Interscience, Hoboken
Ahmad S, Kabir M, Hayat M (2015) Comput Methods Programs Biomed 122(2):165
Kabir M, Iqbal M, Ahmad S, Hayat M (2015) Comput Biol Med 66:252
Chen CC, Schwender H, Keith J, Nunkesser R, Mengersen K, Macrossan P (2011) IEEE/ACM Trans Comput Biol Bioinf 8(6):1580
Nanni L, Lumini A, Gupta D, Garg A (2012) IEEE/ACM Trans Comput Biol Bioinf 9(2):467
Kabir M, Ahmad S, Iqbal M, Swati ZNK, Liu Z, Yu D-J (2018) Chemom Intell Lab Syst 174:22
Wang T, Yang J (2010) Protein Pept Lett 17(1):32
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61772273, 61373062) and the Fundamental Research Funds for the Central Universities (Grant No. 30918011104).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they no conflict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ali, F., Ahmed, S., Swati, Z.N.K. et al. DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information. J Comput Aided Mol Des 33, 645–658 (2019). https://doi.org/10.1007/s10822-019-00207-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-019-00207-x