Abstract
Analyzing the natural evolution of proteins by ancestral sequence reconstruction (ASR) can provide valuable information about the changes in sequence and structure that drive the development of novel protein functions. However, ASR has also been used as a protein engineering tool, as it often generates thermostable proteins which can serve as robust and evolvable templates for enzyme engineering. Importantly, ASR has the potential to provide an insight into the history of insertions and deletions that have occurred in the evolution of a protein family. Indels are strongly associated with functional change during enzyme evolution and represent a largely unexplored source of genetic diversity for designing proteins with novel or improved properties. Current ASR methods differ in the way they handle indels; inclusion or exclusion of indels is often managed subjectively, based on assumptions the user makes about the likelihood of each recombination event, yet most currently available ASR tools provide limited, if any, opportunities for evaluating indel placement in a reconstructed sequence. Graphical Representation of Ancestral Sequence Predictions (GRASP) is an ASR tool that maps indel evolution throughout a reconstruction and enables the evaluation of indel variants. This chapter provides a general protocol for performing a reconstruction using GRASP and using the results to create indel variants. The method addresses protein template selection, sequence curation, alignment refinement, tree building, ancestor reconstruction, evaluation of indel variants and approaches to library development.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Saab-Rincon G, Li Y, Meyer M, Carbone M, Landwehr M, Arnold FH (2011) Protein engineering by structure-guided SCHEMA recombination. In: Lutz S, Bornscheuer U (eds) Protein engineering handbook, 1st edn: 481-492. Wiley-VCH, Darmstadt
Zhang Z, Wang J, Gong Y, Li Y (2018) Contributions of substitutions and indels to the structural variations in ancient protein superfamilies. BMC Genomics 19(1):771. https://doi.org/10.1186/s12864-018-5178-8
Emond S, Petek M, Kay EJ, Heames B, Devenish SRA, Tokuriki N, Hollfelder F (2020) Accessing unexplored regions of sequence space in directed enzyme evolution via insertion/deletion mutagenesis. Nat Commun 11(1):3469. https://doi.org/10.1038/s41467-020-17061-3
Arpino JA, Reddington SC, Halliwell LM, Rizkallah PJ, Jones DD (2014) Random single amino acid deletion sampling unveils structural tolerance and the benefits of helical registry shift on GFP folding and structure. Structure 22(6):889–898. https://doi.org/10.1016/j.str.2014.03.014
Li D, Jackson EL, Spielman SJ, Wilke CO (2017) Computational prediction of the tolerance to amino-acid deletion in green-fluorescent protein. PLoS One 12(4):e0164905. https://doi.org/10.1371/journal.pone.0164905
Kim R, Guo J-T (2010) Systematic analysis of short internal indels and their impact on protein folding. BMC Struct Biol 10(1):24. https://doi.org/10.1186/1472-6807-10-24
Chang MSS, Benner SA (2004) Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J Mol Biol 341(2):617–631. https://doi.org/10.1016/j.jmb.2004.05.045
Light S, Sagit R, Sachenkova O, Ekman D, Elofsson A (2013) Protein expansion is primarily due to Indels in intrinsically disordered regions. Mol Biol Evol 30(12):2645–2653. https://doi.org/10.1093/molbev/mst157
Fraternali F, Joseph AP, Valadié H, Srinivasan N, de Brevern AG (2012) Local structural differences in homologous proteins: specificities in different SCOP classes. PLoS One 7(6):e38805. https://doi.org/10.1371/journal.pone.0038805
de la Chaux N, Messer PW, Arndt PF (2007) DNA indels in coding regions reveal selective constraints on protein evolution in the human lineage. BMC Evol Biol 7(1):191. https://doi.org/10.1186/1471-2148-7-191
Leushkin EV, Bazykin GA, Kondrashov AS (2012) Insertions and deletions trigger adaptive walks in drosophila proteins. Proc R Soc B Biol Sci 279(1740):3075–3082. https://doi.org/10.1098/rspb.2011.2571
Zhang Z, Huang J, Wang Z, Wang L, Gao P (2011) Impact of Indels on the flanking regions in structural domains. Mol Biol Evol 28(1):291–301. https://doi.org/10.1093/molbev/msq196
Bloom JD, Labthavikul ST, Otey CR, Arnold FH (2006) Protein stability promotes evolvability. Proc Natl Acad Sci U S A 103(15):5869–5874. https://doi.org/10.1073/pnas.0510098103
Ayuso-Fernandez I, Ruiz-Duenas FJ, Martinez AT (2018) Evolutionary convergence in lignin-degrading enzymes. Proc Natl Acad Sci U S A 115(25):6428–6433. https://doi.org/10.1073/pnas.1802555115
Groussin M, Hobbs JK, Szollosi GJ, Gribaldo S, Arcus VL, Gouy M (2015) Toward more accurate ancestral protein genotype-phenotype reconstructions with the use of species tree-aware gene trees. Mol Biol Evol 32(1):13–22. https://doi.org/10.1093/molbev/msu305
Thomas A, Cutlan R, Finnigan W, van der Giezen M, Harmer N (2019) Highly thermostable carboxylic acid reductases generated by ancestral sequence reconstruction. Commun Biol 2:429. https://doi.org/10.1038/s42003-019-0677-y
Schenkmayerova A, Pinto G, Toul M, Marek M, Hernychova L, Planas-Iglesias J, Liskova V, Pluskal D, Vasina M, Emond S, Doerr M, Chaloupková R, Bednar D, Prokop Z, Hollfelder F, Bornscheuer U, Damborsky J (2020) Engineering protein dynamics of ancestral luciferase. ChemRxiv. https://doi.org/10.26434/chemrxiv.12808295.v1
Thornton JW (2004) Resurrecting ancient genes: experimental analysis of extinct molecules. Nat Rev Genet 5(5):366–375
Felsenstein J (2003) Inferring Phylogenies. Sinauer Associates, Inc., Sunderland, MA
Pupko T, Pe I, Shamir R, Graur D (2000) A fast algorithm for joint reconstruction of ancestral amino acid sequences. Mol Biol Evol 17(6):890–896. https://doi.org/10.1093/oxfordjournals.molbev.a026369
Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24(8):1586–1591. https://doi.org/10.1093/molbev/msm088
Koshi JM, Goldstein RA (1996) Probabilistic reconstruction of ancestral protein sequences. J Mol Evol 42(2):313–320. https://doi.org/10.1007/bf02198858
Vialle RA, Tamuri AU, Goldman N (2018) Alignment modulates ancestral sequence reconstruction accuracy. Mol Biol Evol 35(7):1783–1797. https://doi.org/10.1093/molbev/msy055
Merkl R, Sterner R (2016) Ancestral protein reconstruction: techniques and applications. Biol Chem 397(1):1–21. https://doi.org/10.1515/hsz-2015-0158
Moretti S, Armougom F, Wallace IM, Higgins DG, Jongeneel CV, Notredame C (2007) The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods. Nucleic Acids Res 35(Web Server Issue):W645–W648. https://doi.org/10.1093/nar/gkm333
Sela I, Ashkenazy H, Katoh K, Pupko T (2015) GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res 43(W1):W7–W14. https://doi.org/10.1093/nar/gkv318
Jehl P, Sievers F, Higgins DG (2015) OD-seq: outlier detection in multiple sequence alignments. BMC Bioinformatics 16(1):269. https://doi.org/10.1186/s12859-015-0702-1
Chiner-Oms A, González-Candelas F (2016) EvalMSA: a program to evaluate multiple sequence alignments and detect outliers. Evol Bioinform Online 12:277–284. https://doi.org/10.4137/ebo.S40583
Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ (2009) Jalview version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics 25(9):1189–1191. https://doi.org/10.1093/bioinformatics/btp033
Cohen O, Ashkenazy H, Belinky F, Huchon D, Pupko T (2010) GLOOME: gain loss mapping engine. Bioinformatics 26(22):2914–2915. https://doi.org/10.1093/bioinformatics/btq549
Edwards RJ, Shields DC (2004) GASP: gapped ancestral sequence prediction for proteins. BMC Bioinformatics 5(1):123. https://doi.org/10.1186/1471-2105-5-123
Musil M, Khan RT, Beier A, Stourac J, Konegger H, Damborsky J, Bednar D (2020) FireProtASR: a web server for fully automated ancestral sequence reconstruction. Brief Bioinform. 22(4): bbaa337. https://doi.org/10.1093/bib/bbaa337
Oliva A, Pulicani S, Lefort V, Bréhélin L, Gascuel O, Guindon S (2019) Accounting for ambiguity in ancestral sequence reconstruction. Bioinformatics 35(21):4290–4297. https://doi.org/10.1093/bioinformatics/btz249
Lanfear R, von Haeseler A, Woodhams MD, Schrempf D, Chernomor O, Schmidt HA, Minh BQ, Teeling E (2020) IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37(5):1530–1534. https://doi.org/10.1093/molbev/msaa015
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313. https://doi.org/10.1093/bioinformatics/btu033
Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A (2019) RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 35(21):4453–4455. https://doi.org/10.1093/bioinformatics/btz305
Hanson-Smith V, Kolaczkowski B, Thornton JW (2010) Robustness of ancestral sequence reconstruction to phylogenetic uncertainty. Mol Biol Evol 27(9):1988–1999. https://doi.org/10.1093/molbev/msq081
Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool 20(4):406–416. https://doi.org/10.2307/2412116
Wheeler D (2003) Selecting the right protein-scoring matrix. Curr Protoc Bioinformatics. Chapter 3:Unit 3.5. https://doi.org/10.1002/0471250953.bi0305s00
Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4):772–780. https://doi.org/10.1093/molbev/mst010
Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 14(6):587–589. https://doi.org/10.1038/nmeth.4285
Darriba D, Taboada GL, Doallo R, Posada D (2011) ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27(8):1164–1165. https://doi.org/10.1093/bioinformatics/btr088
Minh BQ, Nguyen MAT, von Haeseler A (2013) Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol 30(5):1188–1195. https://doi.org/10.1093/molbev/mst024
Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4):783–791. https://doi.org/10.2307/2408678
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Ross, C.M., Foley, G., Boden, M., Gillam, E.M.J. (2022). Using the Evolutionary History of Proteins to Engineer Insertion-Deletion Mutants from Robust, Ancestral Templates Using Graphical Representation of Ancestral Sequence Predictions (GRASP). In: Magnani, F., Marabelli, C., Paradisi, F. (eds) Enzyme Engineering. Methods in Molecular Biology, vol 2397. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1826-4_6
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1826-4_6
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1825-7
Online ISBN: 978-1-0716-1826-4
eBook Packages: Springer Protocols