AngularQA: Protein Model Quality Assessment with LSTM Networks

Matthew Conover; Max Staples; Dong Si; Miao Sun; Renzhi Cao

doi:10.1515/cmb-2019-0001

Open Access Published by De Gruyter Open Access May 29, 2019

AngularQA: Protein Model Quality Assessment with LSTM Networks

Matthew Conover , Max Staples , Dong Si , Miao Sun and Renzhi Cao

From the journal Computational and Mathematical Biophysics

https://doi.org/10.1515/cmb-2019-0001

Abstract

Quality Assessment (QA) plays an important role in protein structure prediction. Traditional multimodel QA method usually suffer from searching databases or comparing with other models for making predictions, which usually fail when the poor quality models dominate the model pool. We propose a novel protein single-model QA method which is built on a new representation that converts raw atom information into a series of carbon-alpha (Cα) atoms with side-chain information, defined by their dihedral angles and bond lengths to the prior residue. An LSTM network is used to predict the quality by treating each amino acid as a time-step and consider the final value returned by the LSTM cells. To the best of our knowledge, this is the first time anyone has attempted to use an LSTM model on the QA problem; furthermore, we use a new representation which has not been studied for QA. In addition to angles, we make use of sequence properties like secondary structure parsed from protein structure at each time-step without using any database, which is different than all existed QA methods. Our model achieves an overall correlation of 0.651 on the CASP12 testing dataset. Our experiment points out new directions for QA problem and our method could be widely used for protein structure prediction problem. The software is freely available at GitHub: https://github.com/caorenzhi/AngularQA

References

Basith, Shaherin, Balachandran Manavalan, Tae Hwan Shin, and Gwang Lee. 2018. “iGHBP: Computational Identification of Growth Hormone Binding Proteins from Sequences Using Extremely Randomised Tree.” Computational and Structural Biotechnology Journal 16 (October): 412–20.10.1016/j.csbj.2018.10.007Search in Google Scholar PubMed PubMed Central

Bhattacharya, Debswapna, Renzhi Cao, and Jianlin Cheng. 2016. “UniCon3D: De Novo Protein Structure Prediction Using United-Residue Conformational Search via Stepwise, Probabilistic Sampling.” Bioinformatics 32 (18): 2791–99.10.1093/bioinformatics/btw316Search in Google Scholar PubMed PubMed Central

Cao, Renzhi, Debswapna Bhattacharya, Jie Hou, and Jianlin Cheng. 2016. “DeepQA: Improving the Estimation of Single Protein Model Quality with Deep Belief Networks.” BMC Bioinformatics 17 (1): 495.10.1186/s12859-016-1405-ySearch in Google Scholar PubMed PubMed Central

Cao, Renzhi, Zheng Wang, and Jianlin Cheng. 2014. “Designing and Evaluating the MULTICOM Protein Local and Global Model Quality Prediction Methods in the CASP10 Experiment.” BMC Structural Biology 14 (April): 13.10.1186/1472-6807-14-13Search in Google Scholar PubMed PubMed Central

Chen, Wei, Hao Lv, Fulei Nie, and Hao Lin. 2019. “i6mA-Pred: Identifying DNA N6-Methyladenine Sites in the Rice Genome.” Bioinformatics, January. https://doi.org/10.1093/bioinformatics/btz015.10.1093/bioinformatics/btz015Search in Google Scholar PubMed

Chen, Wei, Hui Yang, Pengmian Feng, Hui Ding, and Hao Lin. 2017. “iDNA4mC: Identifying DNA N4-Methylcytosine Sites Based on Nucleotide Chemical Properties.” Bioinformatics 33 (22): 3518–23.10.1093/bioinformatics/btx479Search in Google Scholar PubMed

Dao, Fu-Ying, Hao Lv, FangWang, Chao-Qin Feng, Hui Ding, Wei Chen, and Hao Lin. 2018. “Identify Origin of Replication in Saccharomyces Cerevisiae Using Two-Step Feature Selection Technique.” Bioinformatics. https://doi.org/10.1093/bioinformatics/bty943.10.1093/bioinformatics/bty943Search in Google Scholar PubMed

Deng, Haiyou, Ya Jia, and Yang Zhang. 2016. “3DRobot: Automated Generation of Diverse and Well-Packed Protein Structure Decoys.” Bioinformatics 32(3):378–87.10.1093/bioinformatics/btv601Search in Google Scholar PubMed PubMed Central

Feng, Chao-Qin, Zhao-Yue Zhang, Xiao-Juan Zhu, Yan Lin, Wei Chen, Hua Tang, and Hao Lin. 2018. “iTerm-PseKNC: A Sequence-Based Tool for Predicting Bacterial Transcriptional Terminators.” Bioinformatics, September. https://doi.org/10.1093/bioinformatics/bty827.10.1093/bioinformatics/bty827Search in Google Scholar PubMed

Feng, Peng-Mian, Wei Chen, Hao Lin, and Kuo-Chen Chou. 2013. “iHSP-PseRAAAC: Identifying the Heat Shock Protein Families Using Pseudo Reduced Amino Acid Alphabet Composition.” Analytical Biochemistry 442 (1): 118–25.10.1016/j.ab.2013.05.024Search in Google Scholar PubMed

Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation 9 (8): 1735–80.10.1162/neco.1997.9.8.1735Search in Google Scholar PubMed

Huang, Qiuyuan, Paul Smolensky, Xiaodong He, Li Deng, and Dapeng Wu. 2018. “Tensor Product Generation Networks for Deep NLP Modeling.” In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). https://doi.org/10.18653/v1/n18-1114.10.18653/v1/N18-1114Search in Google Scholar

Huang, Qiuyuan, Pengchuan Zhang, Dapeng Wu, and Lei Zhang. 2018. “Turbo Learning for CaptionBot and DrawingBot.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6456–66. Curran Associates, Inc.Search in Google Scholar

Jacobson, Matthew, and Andrej Sali. 2004. “Comparative Protein Structure Modeling and Its Applications to Drug Discovery.” In Annual Reports in Medicinal Chemistry, 259–76.Search in Google Scholar

Joosten, Robbie P., Tim A. H. te Beek, Elmar Krieger, Maarten L. Hekkelman, Rob W. W. Hooft, Reinhard Schneider, Chris Sander, and Gert Vriend. 2011. “A Series of PDB Related Databases for Everyday Needs.” Nucleic Acids Research 39 (Database issue): D411–19.10.1093/nar/gkq1105Search in Google Scholar PubMed PubMed Central

Lai, Hong-Yan, Xin-Xin Chen, Wei Chen, Hua Tang, and Hao Lin. 2017. “Sequence-Based Predictive Modeling to Identify Cancerlectins.” Oncotarget 8 (17): 28169–75.10.18632/oncotarget.15963Search in Google Scholar PubMed PubMed Central

Li, Dapeng, Ying Ju, and Quan Zou. 2016. “Protein Folds Prediction with Hierarchical Structured SVM.” Current Proteomics 13 (2): 79–85.10.2174/157016461302160514000940Search in Google Scholar

Li, Jilong, Renzhi Cao, and Jianlin Cheng. 2015. “A Large-Scale Conformation Sampling and Evaluation Server for Protein Tertiary Structure Prediction and Its Assessment in CASP11.” BMC Bioinformatics 16 (October): 337.10.1186/s12859-015-0775-xSearch in Google Scholar PubMed PubMed Central

Li, Jilong, and Jianlin Cheng. 2016. “A Stochastic Point Cloud Sampling Method for Multi-Template Protein Comparative Modeling.” Scientific Reports 6 (May): 25687.10.1038/srep25687Search in Google Scholar PubMed PubMed Central

Liu, Yang, Qing Ye, Liwei Wang, and Jian Peng. 2017. “Learning Structural Motif Representations For Efficient Protein Structure Search.” https://doi.org/10.1101/137828.10.1101/137828Search in Google Scholar

Manavalan, Balachandran, Shaherin Basith, Tae Hwan Shin, Sun Choi, Myeong Ok Kim, and Gwang Lee. 2017. “MLACP: Machine-Learning-Based Prediction of Anticancer Peptides.” Oncotarget 8 (44): 77121–36.10.18632/oncotarget.20365Search in Google Scholar PubMed PubMed Central

Manavalan, Balachandran, Shaherin Basith, Tae Hwan Shin, Leyi Wei, and Gwang Lee. 2018. “mAHTPred: A Sequence-Based Meta-Predictor for Improving the Prediction of Anti-Hypertensive Peptides Using Effective Feature Representation.” Bioinformatics, December. https://doi.org/10.1093/bioinformatics/bty1047.10.1093/bioinformatics/bty1047Search in Google Scholar PubMed

Manavalan, Balachandran, and Jooyoung Lee. 2017. “SVMQA: Support–vector-Machine-Based Protein Single-Model Quality Assessment.” Bioinformatics 33 (16): 2496–2503.10.1093/bioinformatics/btx222Search in Google Scholar PubMed

Manavalan, Balachandran, Juyong Lee, and Jooyoung Lee. 2014. “Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms.” PloS One 9 (9): e106542.10.1371/journal.pone.0106542Search in Google Scholar PubMed PubMed Central

Manavalan, Balachandran, Tae Hwan Shin, Myeong Ok Kim, and Gwang Lee. 2018. “PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions.” Frontiers in Immunology 9 (July): 1783.10.3389/fimmu.2018.01783Search in Google Scholar PubMed PubMed Central

McGuffin, Liam J., Maria T. Buenavista, and Daniel B. Roche. 2013. “The ModFOLD4 Server for the Quality Assessment of 3D Protein Models.” Nucleic Acids Research 41 (Web Server issue): W368–72.10.1093/nar/gkt294Search in Google Scholar PubMed PubMed Central

Moult, J., J. T. Pedersen, R. Judson, and K. Fidelis. 1995. “A Large-Scale Experiment to Assess Protein Structure Prediction Methods.” Proteins 23 (3): ii – v.10.1002/prot.340230303Search in Google Scholar PubMed

Peterson, Lenna X., Woong-Hee Shin, Hyungrae Kim, and Daisuke Kihara. 2017. “Improved Performance in CAPRI Round 37 Using LZerD Docking and Template-Based Modeling with Combined Scoring Functions.” Proteins, August. https://doi.org/10.1002/prot.25376.10.1002/prot.25376Search in Google Scholar PubMed PubMed Central

Roy, Ambrish, Alper Kucukural, and Yang Zhang. 2010. “I-TASSER: A Unified Platform for Automated Protein Structure and Function Prediction.” Nature Protocols 5 (4): 725–38.10.1038/nprot.2010.5Search in Google Scholar PubMed PubMed Central

Shin, Woong-Hee, Charles W. Christoffer, and Daisuke Kihara. 2017. “In Silico Structure-Based Approaches to Discover Protein-Protein Interaction-Targeting Drugs.” Methods 131 (December): 22–32.10.1016/j.ymeth.2017.08.006Search in Google Scholar PubMed PubMed Central

Shin, Woong-Hee, Xuejiao Kang, Jian Zhang, and Daisuke Kihara. 2017. “Prediction of Local Quality of Protein Structure Models Considering Spatial Neighbors in Graphical Models.” Scientific Reports 7: 40629.10.1038/srep40629Search in Google Scholar PubMed PubMed Central

Tang, Hua, Ya-Wei Zhao, Ping Zou, Chun-Mei Zhang, Rong Chen, Po Huang, and Hao Lin. 2018. “HBPred: A Tool to Identify Growth Hormone-Binding Proteins.” International Journal of Biological Sciences 14 (8): 957–64.10.7150/ijbs.24174Search in Google Scholar PubMed PubMed Central

Uziela, Karolis, Nanjiang Shu, Björn Wallner, and Arne Elofsson. 2016. “ProQ3: Improved Model Quality Assessments Using Rosetta Energy Terms.” Scientific Reports 6 (October): 33509.10.1038/srep33509Search in Google Scholar PubMed PubMed Central

Wallner, Björn, and Arne Elofsson. 2005. “Pcons5: Combining Consensus, Structural Evaluation and Fold Recognition Scores.” Bioinformatics 21 (23): 4248–54.10.1093/bioinformatics/bti702Search in Google Scholar PubMed

Wang, Chao, Haicang Zhang, Wei-Mou Zheng, Dong Xu, Jianwei Zhu, Bing Wang, Kang Ning, Shiwei Sun, Shuai Cheng Li, and Dongbo Bu. 2015. “FALCON@home: A High-Throughput Protein Structure Prediction Server Based on Remote Homologue Recognition.” Bioinformatics 32 (3): 462–64.10.1093/bioinformatics/btv581Search in Google Scholar PubMed PubMed Central

Wei, Leyi, Minghong Liao, Xing Gao, and Quan Zou. 2015. “Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique.” IEEE Transactions on Nanobioscience 14 (6): 649–59.10.1109/TNB.2015.2450233Search in Google Scholar PubMed

Wei, Leyi, and Quan Zou. 2016. “Recent Progress inMachine Learning-Based Methods for Protein Fold Recognition.” International Journal of Molecular Sciences 17 (12): 2118.10.3390/ijms17122118Search in Google Scholar PubMed PubMed Central

Xu, Dong, and Yang Zhang. 2012. “Ab Initio Protein Structure Assembly Using Continuous Structure Fragments and Optimized Knowledge-Based Force Field.” Proteins 80 (7): 1715–35.10.1002/prot.24065Search in Google Scholar PubMed PubMed Central

Yang, Hui, Hao Lv, Hui Ding, Wei Chen, and Hao Lin. 2018. “iRNA-2OM: A Sequence-Based Predictor for Identifying 2’-OMethylation Sites in Homo Sapiens.” Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 25 (11): 1266–77.10.1089/cmb.2018.0004Search in Google Scholar PubMed

Zemla, Adam. 2003. “LGA: A Method for Finding 3D Similarities in Protein Structures.” Nucleic Acids Research 31 (13): 3370–74.10.1093/nar/gkg571Search in Google Scholar PubMed PubMed Central

Zou, Quan, Pengwei Xing, Leyi Wei, and Bin Liu. 2019. “Gene2vec: Gene Subsequence Embedding for Prediction of Mammalian-Methyladenosine Sites from mRNA.” RNA 25 (2): 205–18.10.1261/rna.069112.118Search in Google Scholar PubMed PubMed Central

Received: 2018-09-09

Accepted: 2019-05-01

Published Online: 2019-05-29

This work is licensed under the Creative Commons Attribution 4.0 Public License.

AngularQA: Protein Model Quality Assessment with LSTM Networks

Abstract

References

Journal and Issue

Articles in the same Issue