Skip to main content

Advertisement

Log in

Identifying novel oncogenes: A machine learning approach

  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Genome sequencing has overflowed the databases with huge amount of SNP data. Although the amount of detected single nucleotide polymorphisms (SNPs) is rising exponentially every day, we still lag behind in characterization techniques. Implementing computational platforms to determine the pathogenecity associated with the SNPs can provide a probable solution to this problem. To improve the prediction quality for SNP characterization methods, we implemented machine learning support vector classification method. Total 557 non-synonymous amino acid variants were collected from CENP family proteins, excluding CENPE. Multivariate simulation of associated changes in biological phenomena’s for each SNPs was computed through available SNP analysis platforms. Support vector model was designed using training dataset and the raw classification data was subjected to the classification hyperplane. We observed multiple evidences of cancer associated genetic mutations in CENPI, CENPJ, CENPK, CENPL and CENPX protein. The former four proteins have showed positive hits in cosmic database for mutations in tumour samples, but CENPX has never been reported before for the cancer associated outcomes. Since CENPX has been recently classified and not much functional and pathological insight has been, the results obtained in this study will serve as a starting point for future investigation on cancer research in association to CENPX protein.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl Acid Res 25, 3389–3402.

    Article  CAS  Google Scholar 

  2. Baker, D.J., Chen, J., van Deursen, J.M. 2005. The mitotic checkpoint in cancer and aging: What have mice taught us? Curr Opin Cell Biol 17, 583–589.

    Article  CAS  PubMed  Google Scholar 

  3. Bamford, S., Dawson, E., Forbes, S., Clements, J., Pettett, R., Dogan, A., Flanagan, A., Teague, J., Futreal, P.A., Stratton, M.R., Wooster, R. 2004. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer 91, 355–358.

    CAS  PubMed Central  PubMed  Google Scholar 

  4. Calabrese, R., Capriotti, E., Fariselli, P., Martelli, P.L., Casadio, R. 2009. Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat 30, 1237–1244.

    Article  CAS  PubMed  Google Scholar 

  5. Cao, J.Y. 2010. Prognostic significance and therapeutic implications of centromere protein F expression in human nasopharyngeal carcinoma. Mol Cancer 9, 237.

    Article  PubMed Central  PubMed  Google Scholar 

  6. Capriotti, E., Altman, R.B. 2011. A new diseasespecific machine learning approach for the prediction of cancer-causing missense variants. Genomics 98, 310–317.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  7. Capriotti, E., Calabrese, R., Casadio, R. 2006. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22, 2729–2734.

    Article  CAS  PubMed  Google Scholar 

  8. Chang, C.-C., Lin, C.-J. 2011. LIBSVM: A library for support vector machines. ACM TIST 2, 27.

    Google Scholar 

  9. Dai, W., Cogswell, J.P. 2003. Polo-like kinases and the microtubule organization center: Targets for cancer therapies. Prog Cell Cycle Res 5, 327–334.

    PubMed  Google Scholar 

  10. Ferrer-Costa, C., Gelpí, J.L., Zamakola, L., Parraga, I., de la Cruz, X., Orozco, M. 2005. PMUT: A webbased tool for the annotation of pathological mutations on proteins. Bioinformatics 21, 3176–3178.

    Article  CAS  PubMed  Google Scholar 

  11. Guo, X.Z., Zhang, G., Wang, J.Y., Liu, W.L., Wang, F., Dong, J.Q., Xu, L.H., Cao, J.Y., Song, L.B., Zeng, M.S. 2008. Prognostic relevance of centromere protein H expression in esophageal carcinoma. BMC Cancer 8, 233.

    Article  PubMed Central  PubMed  Google Scholar 

  12. Hu, H., Liu, Y., Wang, M., Fang, J., Huang, H., Yang, N., Li, Y., Wang, J., Yao, X., Shi, Y., Li, G., Xu, R.M. 2011. Structure of a CENP-A-histone H4 heterodimer in complex with chaperone HJURP. Genes Dev 25, 901–906.

    Article  CAS  PubMed  Google Scholar 

  13. Kamaraj, B., Purohit, R. 2013a. Mutational analysis of TYR gene and its structural consequences in OCA1A. Gene 513, 184–195.

    Article  Google Scholar 

  14. Kamaraj, B., Purohit, R. 2013b. In-silico analysis of betaine aldehyde dehydrogenase2 of oryza sativa and significant mutations responsible for fragrance. J Plant Interact 8, 321–333.

    Article  CAS  Google Scholar 

  15. Kops, G.J., Weaver, B.A., Cleveland, D.W. 2005. On the road to cancer: aneuploidy and the mitotic checkpoint. Nat Rev Cancer 5, 773–785.

    Article  CAS  PubMed  Google Scholar 

  16. Kumar, P., Henikoff, S., Ng, P.C. 2009. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4, 1073–1081.

    Article  CAS  PubMed  Google Scholar 

  17. Kumar, A., Purohit, R. 2012a. Computational investigation of pathogenic nsSNPs in CEP63 protein. Gene 503, 75–82.

    Article  CAS  PubMed  Google Scholar 

  18. Kumar, A., Purohit, R. 2012b. Computational screening and molecular dynamics simulation of disease associated nsSNPs in CENP-E. Mutat Res 738–739, 28–37.

    Article  PubMed  Google Scholar 

  19. Kumar, A., Purohit, R. 2012c. Computational centrosomics: An approach to understand the dynamic behaviour of centrosome. Gene 511, 125–126.

    Article  CAS  PubMed  Google Scholar 

  20. Kumar, A., Rajendran, V., Sethumadhavan, R., Purohit, R. 2012. In silico prediction of a disease-associated STIL mutant and its affect on the recruitment of centromere protein J (CENPJ). FEBS Open Bio 2, 285–293.

    Article  PubMed Central  PubMed  Google Scholar 

  21. Kumar, A., Rajendran, V., Sethumadhavan, R., Purohit, R. 2013. Insight into Nek2A activity regulation and its pharmacological prospects. Egyp J Med Hum Genet 14, 213–219.

    Article  Google Scholar 

  22. Lupas, A., Van Dyke, M., Stock, J. 1991. Predicting coiled coils from protein sequences. Science 252, 1162–1164.

    Article  CAS  PubMed  Google Scholar 

  23. Pandey, A., Kumar, A., Purohit, R. 2013. Sequencing Closterium moniliferum: Future prospects in nuclear waste disposal. Egyp J Med Hum Genet 14, 113–115.

    Article  Google Scholar 

  24. Purohit, R., Rajasekaran, R., Sudandiradoss, C., George Priya Doss, C., Ramanathan, K., Sethumadhavan, R. 2008. Studies on flexibility and binding affinity of Asp25 of HIV-1 protease mutants. Int J Biol Macromol 42, 386–391.

    Article  CAS  PubMed  Google Scholar 

  25. Purohit, R., Sethumadhavan, R. 2009. Structural basis for the resilience of Darunavir (TMC114) resistance major flap mutations of HIV-1 protease. Interdiscip Sci Comput Life Sci 1, 320–328.

    Article  CAS  Google Scholar 

  26. Purohit, R., Rajendran, V., Sethumadhavan, R. 2011a. Relationship between mutation of serine residue at 315th position in M. tuberculosis catalase-peroxidase enzyme and isoniazid susceptibility: An in silico analysis. J Mol Model 17, 869–877.

    Article  CAS  PubMed  Google Scholar 

  27. Purohit, R., Rajendran, V., Sethumadhavan, R. 2011b. Studies on adaptability of binding residues and flap region of TMC-114 resistance HIV-1 protease mutants. J Biomol Struct Dyn 29, 137–152.

    Article  CAS  PubMed  Google Scholar 

  28. Rajendran, V., Purohit, R., Sethumadhavan, R. 2012. In silico investigation of molecular mechanism of laminopathy cause by a point mutation (R482W) in lamin A/C protein. Amino Acids 43, 603–615.

    Article  CAS  PubMed  Google Scholar 

  29. Rajendran, V., Sethumadhavan, R. 2014. Drug resistance mechanism of PncA in mycobacterium tuberculosis. J Biomol Struct Dyn 32, 209–221.

    Article  CAS  PubMed  Google Scholar 

  30. Ramensky, V., Bork, P., Sunyaev, S. 2002. Human non-synonymous SNPs: Server and survey. Nucl Acid Res 30, 3894–3900.

    Article  CAS  Google Scholar 

  31. Sekulic, N., Bassett, E.A., Rogers, D.J., Black, B.E. 2010. The structure of (CENP-A-H4) (2) reveals physical features that mark centromeres. Nature 467, 347–351.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  32. Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin, K. 2001. dbSNP: The NCBI database of genetic variation. Nucl Acid Res 29, 308–311.

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rituraj Purohit.

Additional information

The authors contributed to the paper equally.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, A., Rajendran, V., Sethumadhavan, R. et al. Identifying novel oncogenes: A machine learning approach. Interdiscip Sci Comput Life Sci 5, 241–246 (2013). https://doi.org/10.1007/s12539-013-0151-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-013-0151-3

Key words

Navigation