Abstract
Genome sequencing has overflowed the databases with huge amount of SNP data. Although the amount of detected single nucleotide polymorphisms (SNPs) is rising exponentially every day, we still lag behind in characterization techniques. Implementing computational platforms to determine the pathogenecity associated with the SNPs can provide a probable solution to this problem. To improve the prediction quality for SNP characterization methods, we implemented machine learning support vector classification method. Total 557 non-synonymous amino acid variants were collected from CENP family proteins, excluding CENPE. Multivariate simulation of associated changes in biological phenomena’s for each SNPs was computed through available SNP analysis platforms. Support vector model was designed using training dataset and the raw classification data was subjected to the classification hyperplane. We observed multiple evidences of cancer associated genetic mutations in CENPI, CENPJ, CENPK, CENPL and CENPX protein. The former four proteins have showed positive hits in cosmic database for mutations in tumour samples, but CENPX has never been reported before for the cancer associated outcomes. Since CENPX has been recently classified and not much functional and pathological insight has been, the results obtained in this study will serve as a starting point for future investigation on cancer research in association to CENPX protein.
Similar content being viewed by others
References
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl Acid Res 25, 3389–3402.
Baker, D.J., Chen, J., van Deursen, J.M. 2005. The mitotic checkpoint in cancer and aging: What have mice taught us? Curr Opin Cell Biol 17, 583–589.
Bamford, S., Dawson, E., Forbes, S., Clements, J., Pettett, R., Dogan, A., Flanagan, A., Teague, J., Futreal, P.A., Stratton, M.R., Wooster, R. 2004. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer 91, 355–358.
Calabrese, R., Capriotti, E., Fariselli, P., Martelli, P.L., Casadio, R. 2009. Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat 30, 1237–1244.
Cao, J.Y. 2010. Prognostic significance and therapeutic implications of centromere protein F expression in human nasopharyngeal carcinoma. Mol Cancer 9, 237.
Capriotti, E., Altman, R.B. 2011. A new diseasespecific machine learning approach for the prediction of cancer-causing missense variants. Genomics 98, 310–317.
Capriotti, E., Calabrese, R., Casadio, R. 2006. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22, 2729–2734.
Chang, C.-C., Lin, C.-J. 2011. LIBSVM: A library for support vector machines. ACM TIST 2, 27.
Dai, W., Cogswell, J.P. 2003. Polo-like kinases and the microtubule organization center: Targets for cancer therapies. Prog Cell Cycle Res 5, 327–334.
Ferrer-Costa, C., Gelpí, J.L., Zamakola, L., Parraga, I., de la Cruz, X., Orozco, M. 2005. PMUT: A webbased tool for the annotation of pathological mutations on proteins. Bioinformatics 21, 3176–3178.
Guo, X.Z., Zhang, G., Wang, J.Y., Liu, W.L., Wang, F., Dong, J.Q., Xu, L.H., Cao, J.Y., Song, L.B., Zeng, M.S. 2008. Prognostic relevance of centromere protein H expression in esophageal carcinoma. BMC Cancer 8, 233.
Hu, H., Liu, Y., Wang, M., Fang, J., Huang, H., Yang, N., Li, Y., Wang, J., Yao, X., Shi, Y., Li, G., Xu, R.M. 2011. Structure of a CENP-A-histone H4 heterodimer in complex with chaperone HJURP. Genes Dev 25, 901–906.
Kamaraj, B., Purohit, R. 2013a. Mutational analysis of TYR gene and its structural consequences in OCA1A. Gene 513, 184–195.
Kamaraj, B., Purohit, R. 2013b. In-silico analysis of betaine aldehyde dehydrogenase2 of oryza sativa and significant mutations responsible for fragrance. J Plant Interact 8, 321–333.
Kops, G.J., Weaver, B.A., Cleveland, D.W. 2005. On the road to cancer: aneuploidy and the mitotic checkpoint. Nat Rev Cancer 5, 773–785.
Kumar, P., Henikoff, S., Ng, P.C. 2009. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4, 1073–1081.
Kumar, A., Purohit, R. 2012a. Computational investigation of pathogenic nsSNPs in CEP63 protein. Gene 503, 75–82.
Kumar, A., Purohit, R. 2012b. Computational screening and molecular dynamics simulation of disease associated nsSNPs in CENP-E. Mutat Res 738–739, 28–37.
Kumar, A., Purohit, R. 2012c. Computational centrosomics: An approach to understand the dynamic behaviour of centrosome. Gene 511, 125–126.
Kumar, A., Rajendran, V., Sethumadhavan, R., Purohit, R. 2012. In silico prediction of a disease-associated STIL mutant and its affect on the recruitment of centromere protein J (CENPJ). FEBS Open Bio 2, 285–293.
Kumar, A., Rajendran, V., Sethumadhavan, R., Purohit, R. 2013. Insight into Nek2A activity regulation and its pharmacological prospects. Egyp J Med Hum Genet 14, 213–219.
Lupas, A., Van Dyke, M., Stock, J. 1991. Predicting coiled coils from protein sequences. Science 252, 1162–1164.
Pandey, A., Kumar, A., Purohit, R. 2013. Sequencing Closterium moniliferum: Future prospects in nuclear waste disposal. Egyp J Med Hum Genet 14, 113–115.
Purohit, R., Rajasekaran, R., Sudandiradoss, C., George Priya Doss, C., Ramanathan, K., Sethumadhavan, R. 2008. Studies on flexibility and binding affinity of Asp25 of HIV-1 protease mutants. Int J Biol Macromol 42, 386–391.
Purohit, R., Sethumadhavan, R. 2009. Structural basis for the resilience of Darunavir (TMC114) resistance major flap mutations of HIV-1 protease. Interdiscip Sci Comput Life Sci 1, 320–328.
Purohit, R., Rajendran, V., Sethumadhavan, R. 2011a. Relationship between mutation of serine residue at 315th position in M. tuberculosis catalase-peroxidase enzyme and isoniazid susceptibility: An in silico analysis. J Mol Model 17, 869–877.
Purohit, R., Rajendran, V., Sethumadhavan, R. 2011b. Studies on adaptability of binding residues and flap region of TMC-114 resistance HIV-1 protease mutants. J Biomol Struct Dyn 29, 137–152.
Rajendran, V., Purohit, R., Sethumadhavan, R. 2012. In silico investigation of molecular mechanism of laminopathy cause by a point mutation (R482W) in lamin A/C protein. Amino Acids 43, 603–615.
Rajendran, V., Sethumadhavan, R. 2014. Drug resistance mechanism of PncA in mycobacterium tuberculosis. J Biomol Struct Dyn 32, 209–221.
Ramensky, V., Bork, P., Sunyaev, S. 2002. Human non-synonymous SNPs: Server and survey. Nucl Acid Res 30, 3894–3900.
Sekulic, N., Bassett, E.A., Rogers, D.J., Black, B.E. 2010. The structure of (CENP-A-H4) (2) reveals physical features that mark centromeres. Nature 467, 347–351.
Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin, K. 2001. dbSNP: The NCBI database of genetic variation. Nucl Acid Res 29, 308–311.
Author information
Authors and Affiliations
Corresponding author
Additional information
The authors contributed to the paper equally.
Rights and permissions
About this article
Cite this article
Kumar, A., Rajendran, V., Sethumadhavan, R. et al. Identifying novel oncogenes: A machine learning approach. Interdiscip Sci Comput Life Sci 5, 241–246 (2013). https://doi.org/10.1007/s12539-013-0151-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-013-0151-3