Skip to main content

Protein Sequence–Structure–Function–Network Links Discovered with the ANNOTATOR Software Suite: Application to ELYS/Mel-28

  • Chapter
  • First Online:
Computational Medicine

Abstract

While very little genomic sequence is interpretable in terms of biological mechanism directly, the chances are much better for protein-coding genes that can be translated into protein sequences. This review considers the different concepts applicable to sequence analysis and function prediction of globular and non-globular protein segments. The publicly accessible ANNOTATOR software environment integrates most of the reliable protein sequence-based function prediction methods, protein domain databases and pathway, and protein–protein interaction collections developed in academia. As application example, the structural and functional domains of mel-28/ELYS, an important nuclear protein, are delineated and are proposed for experimental follow-up in structural biology and functional studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For a tutorial of the user interface refer to (Ooi et al. 2009)

References

  • Accelrys (2011) Pipeline pilot. Accelrys, San Diego. http://accelrys.com/products/pipeline-pilot/. Accessed 02 Dec 2011

  • Acera A, Vecino E, Rodriguez-Agirretxe I et al (2011) Changes in tear protein profile in keratoconus disease. Eye 25:1225–1233

    PubMed  CAS  Google Scholar 

  • Alber F, Dokudovskaya S, Veenhoff LM et al (2007) The molecular architecture of the nuclear pore complex. Nature 450:695–701. doi:10.1038/nature06405

    PubMed  CAS  Google Scholar 

  • Altenhoff AM, Schneider A, Gonnet GH, Dessimoz C (2011) OMA 2011: orthology inference among 1000 complete genomes. Nucleic Acids Res 39:D289–D294. doi:10.1093/nar/gkq1238

    PubMed  Google Scholar 

  • Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410. doi:10.1016/S0022-2836(05)80360-2

    PubMed  CAS  Google Scholar 

  • Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    PubMed  CAS  Google Scholar 

  • Baker NA, Sept D, Joseph S et al (2001) Electrostatics of nanosystems: application to microtubules and the ribosome. Proc Natl Acad Sci USA 98:10037–10041. doi:10.1073/pnas.181342398

    PubMed  CAS  Google Scholar 

  • Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004) Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 340:783–795. doi:10.1016/j.jmb.2004.05.028

    PubMed  Google Scholar 

  • Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28:235–242

    PubMed  CAS  Google Scholar 

  • Biegert A, Soding J (2009) Sequence context-specific profiles for homology searching. Proc Natl Acad Sci 106:3770–3775. doi:10.1073/pnas.0810767106

    PubMed  CAS  Google Scholar 

  • Bork P, Dandekar T, Diaz-Lazcoz Y et al (1998) Predicting function: from genes to genomes and back. J Mol Biol 283:707–725. doi:10.1006/jmbi.1998.2144

    PubMed  CAS  Google Scholar 

  • Brendel V, Bucher P, Nourbakhsh IR et al (1992) Methods and algorithms for statistical analysis of protein sequences. Proc Natl Acad Sci USA 89:2002–2006

    PubMed  CAS  Google Scholar 

  • CLC Bio (2011) CLC genomics workbench. CLC Bio, Aarhus. http://www.clcbio.com/. Accessed 02 Dec 2011

  • Claros MG, von Heijne G (1994) TopPred II: an improved software for membrane protein structure predictions. Comput Appl Biosci 10:685–686

    PubMed  CAS  Google Scholar 

  • Claverie J-M, States DJ (1993) Information enhancement methods for large scale sequence analysis. Comput Chem 17:191–201. doi:10.1016/0097-8485(93)85010-A

    CAS  Google Scholar 

  • Claverie JM (1994) Large Scale Sequence Analysis, Chapter 36 in “Automated DNA sequencing and analysis techniques.” (Adams MD, Fields C, Venter JC, eds), Academic Press New York, pp. 2679–279

    Google Scholar 

  • Cole C, Barber JD, Barton GJ (2008) The Jpred 3 secondary structure prediction server. Nucleic Acids Res 36:W197–W201. doi:10.1093/nar/gkn238

    PubMed  CAS  Google Scholar 

  • Cserzö M, Eisenhaber F, Eisenhaber B, Simon I (2002) On filtering false positive transmembrane protein predictions. Protein Eng 15:745–752

    PubMed  Google Scholar 

  • Cserzo M, Eisenhaber F, Eisenhaber B, Simon I (2003) TM or not TM: transmembrane protein prediction with low false positive rate using DAS-TMfilter. Bioinformatics 20:136–137. doi:10.1093/bioinformatics/btg394

    Google Scholar 

  • Cuff JA, Barton GJ (1999) Proteins 34(4):508–519

    Google Scholar 

  • Dayhoff M (1979) Atlas of protein sequence and structure. National Biomedical Research Foundation, Washington

    Google Scholar 

  • Di Tommaso P, Moretti S, Xenarios I et al (2011) T-coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res 39:W13–W17. doi:10.1093/nar/gkr245

    PubMed  Google Scholar 

  • Do CB, Mahabhashyam MSP, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340. doi:10.1101/gr.2821705

    PubMed  CAS  Google Scholar 

  • Dosztányi Z, Csizmok V, Tompa P, Simon I (2005a) IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21:3433–3434. doi:10.1093/bioinformatics/bti541

    PubMed  Google Scholar 

  • Dosztányi Z, Csizmók V, Tompa P, Simon I (2005b) The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol 347:827–839. doi:10.1016/j.jmb.2005.01.071

    PubMed  Google Scholar 

  • Dyrlov Bendtsen J, Nielsen H, von Heijne G, Brunak Sã (2004) Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 340:783–795. doi:10.1016/j.jmb.2004.05.028

    Google Scholar 

  • Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14:755–763

    PubMed  CAS  Google Scholar 

  • Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7(10):e1002195

    Google Scholar 

  • Edgar RC (2004a) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113. doi:10.1186/1471-2105-5-113

    PubMed  Google Scholar 

  • Edgar RC (2004b) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. doi:10.1093/nar/gkh340

    PubMed  CAS  Google Scholar 

  • Eisenhaber F (2006) Prediction of protein function. Discovering biomolecular mechanisms with computational biology, 1st edn. Springer, Hiedelberg, pp 39–54

    Google Scholar 

  • Eisenhaber F (2012) A decade after the first full human genome sequencing: When will we understand our own genome? J Bioinformatics Comp Biol 10:1271001

    Google Scholar 

  • Eisenhaber B, Eisenhaber F (2007) Posttranslational modifications and subcellular localization signals: indicators of sequence regions without inherent 3D structure? Curr Protein Pept Sci 8:197–203

    PubMed  CAS  Google Scholar 

  • Eisenhaber F, Imperiale F, Argos P, Frömmel C (1996) Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods. Proteins 25:157–168. doi:10.1002/(SICI)1097-0134(199606)25:2<157::AID-PROT2>3.0.CO;2-F

    PubMed  CAS  Google Scholar 

  • Eisenhaber B, Bork P, Eisenhaber F (1999) Prediction of potential GPI-modification sites in proprotein sequences. J Mol Biol 292:741–758. doi:10.1006/jmbi.1999.3069

    PubMed  CAS  Google Scholar 

  • Eisenhaber B, Maurer-Stroh S, Novatchkova M et al (2003a) Enzymes and auxiliary factors for GPI lipid anchor biosynthesis and post-translational transfer to proteins. Bioessays 25:367–385. doi:10.1002/bies.10254

    PubMed  CAS  Google Scholar 

  • Eisenhaber F, Eisenhaber B, Kubina W et al (2003b) Prediction of lipid posttranslational modifications and localization signals from protein sequences: big-Pi, NMT and PTS1. Nucleic Acids Res 31:3631–3634

    PubMed  CAS  Google Scholar 

  • Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30:1575–1584

    PubMed  CAS  Google Scholar 

  • Eswar N, Webb B, Marti-Renom MA et al (2006) Comparative protein structure modeling using Modeller. Curr Protoc Bioinformatics Unit 5.6 (Chap 5). doi:10.1002/0471250953.bi0506s15

  • Eswar N, Webb B, Marti-Renom MA et al (2007) Comparative protein structure modeling using MODELLER. Curr Protoc Protein Sci Unit 2.9 (Chap 2). doi:10.1002/0471140864.ps0209s50

  • Ferguson MA (1999) The structure, biosynthesis and functions of glycosylphosphatidylinositol anchors, and the contributions of trypanosome research. J Cell Sci 112(Pt 17):2799–2809

    PubMed  CAS  Google Scholar 

  • Fiser A, Do RK, Sali A (2000) Modeling of loops in protein structures. Protein Sci 9:1753–1773. doi:10.1110/ps.9.9.1753

    PubMed  CAS  Google Scholar 

  • Franz C, Walczak R, Yavuz S et al (2007) MEL-28/ELYS is required for the recruitment of nucleoporins to chromatin and postmitotic nuclear pore complex assembly. EMBO Rep 8:165–172. doi:10.1038/sj.embor.7400889

    PubMed  CAS  Google Scholar 

  • Frishman D, Argos P (1996) Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng Des Sel 9:133–142. doi:10.1093/protein/9.2.133

    CAS  Google Scholar 

  • Frishman D, Argos P (1997) Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27:329–335

    PubMed  CAS  Google Scholar 

  • Galy V, Askjaer P, Franz C et al (2006) MEL-28, a novel nuclear-envelope and kinetochore protein essential for zygotic nuclear-envelope assembly in C. elegans. Curr Biol 16:1748–1756. doi:10.1016/j.cub.2006.06.067

    PubMed  CAS  Google Scholar 

  • Green RE, Krause J, Briggs AW et al (2010) A draft sequence of the neandertal genome. Science 328:710–722. doi:10.1126/science.1188021

    PubMed  CAS  Google Scholar 

  • Hanson RM (2010) Jmol—a paradigm shift in crystallographic visualization. J Appl Crystallogr 43:1250–1260. doi:10.1107/S0021889810030256

    CAS  Google Scholar 

  • Heijne G (1987) Sequence analysis in molecular biology: treasure trove or trivial pursuit. Academic, San Diego

    Google Scholar 

  • Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89:10915–10919

    PubMed  CAS  Google Scholar 

  • Hulo N, Bairoch A, Bulliard V et al (2008) The 20 years of PROSITE. Nucleic Acids Res 36:D245–D249. doi:10.1093/nar/gkm977

    PubMed  CAS  Google Scholar 

  • Iakoucheva LM, Dunker AK (2003) Order, disorder, and flexibility: prediction from protein sequence. Structure 11:1316–1317

    PubMed  CAS  Google Scholar 

  • Ivshina AV, George J, Senko O et al (2006) Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. Cancer Res 66:10292–10301. doi:10.1158/0008-5472.CAN-05-4414

    PubMed  CAS  Google Scholar 

  • Käll L, Krogh A, Sonnhammer ELL (2004) A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338:1027–1036. doi:10.1016/j.jmb.2004.03.016

    PubMed  Google Scholar 

  • Katoh K (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511–518. doi:10.1093/nar/gki198

    PubMed  CAS  Google Scholar 

  • Katoh K, Toh H (2007) PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23:372–374. doi:10.1093/bioinformatics/btl592

    PubMed  CAS  Google Scholar 

  • Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9:286–298. doi:10.1093/bib/bbn013

    PubMed  CAS  Google Scholar 

  • Katoh K, Misawa K, K-ichi K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066

    PubMed  CAS  Google Scholar 

  • Kedes L, Liu E, Jongeneel CV, Sutton G (2011) Judging the Archon Genomics X PRIZE for whole human genome sequencing. Nat Genet 43:175. doi:10.1038/ng0311-175

    PubMed  CAS  Google Scholar 

  • Kelley LA, Sternberg MJE (2009) Protein structure prediction on the web: a case study using the Phyre server. Nat Protoc 4:363–371. doi:10.1038/nprot.2009.2

    PubMed  CAS  Google Scholar 

  • Kerrien S, Alam-Faruque Y, Aranda B et al (2007) IntAct—open source resource for molecular interaction data. Nucleic Acids Res 35:D561–D565. doi:10.1093/nar/gkl958

    PubMed  CAS  Google Scholar 

  • Keyes RW (2008) Moore’s law today. IEEE Circuits Sys Mag 8:53–54. doi:10.1109/MCAS.2008.923058

    Google Scholar 

  • Kimura N, Takizawa M, Okita K et al (2002) Identification of a novel transcription factor, ELYS, expressed predominantly in mouse foetal haematopoietic tissues. Genes Cells 7:435–446

    PubMed  CAS  Google Scholar 

  • Koonin EV (2001) An apology for orthologs—or brave new memes. Genome Biol 2:COMMENT1005

    Google Scholar 

  • Kreil DP, Ouzounis CA (2003) Comparison of sequence masking algorithms and the detection of biased protein sequence regions. Bioinformatics 19:1672–1681

    PubMed  CAS  Google Scholar 

  • Krogh A, Larsson B, von Heijne G, Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305:567–580. doi:10.1006/jmbi.2000.4315

    PubMed  CAS  Google Scholar 

  • Kryshtafovych A, Fidelis K, Moult J (2011) CASP9 results compared to those of previous CASP experiments. Proteins Struct Funct Bioinformatics. doi:10.1002/prot.23182

  • Lander ES, Linton LM, Birren B et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921. doi:10.1038/35057062

    PubMed  CAS  Google Scholar 

  • Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659. doi:10.1093/bioinformatics/btl158

    PubMed  CAS  Google Scholar 

  • Li W, Jaroszewski L, Godzik A (2001) Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17:282–283

    PubMed  CAS  Google Scholar 

  • Li W, Jaroszewski L, Godzik A (2002) Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 18:77–82. doi:10.1093/bioinformatics/18.1.77

    PubMed  CAS  Google Scholar 

  • Linding R, Jensen LJ, Diella F et al (2003a) Protein disorder prediction. Structure 11:1453–1459. doi:10.1016/j.str.2003.10.002

    PubMed  CAS  Google Scholar 

  • Linding R, Russell RB, Neduva V, Gibson TJ (2003b) GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res 31:3701–3708

    PubMed  CAS  Google Scholar 

  • Lupas A (1996) Prediction and analysis of coiled-coil structures. Meth Enzymol 266:513–525

    PubMed  CAS  Google Scholar 

  • Lupas A, Van Dyke M, Stock J (1991) Predicting coiled coils from protein sequences. Science 252:1162–1164

    CAS  Google Scholar 

  • Marchler-Bauer A, Lu S, Anderson JB et al (2011) CDD: a conserved domain database for the functional annotation of proteins. Nucleic Acids Res 39:D225–D229. doi:10.1093/nar/gkq1189

    PubMed  Google Scholar 

  • Martí-Renom MA, Stuart AC, Fiser A et al (2000) Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 29:291–325. doi:10.1146/annurev.biophys.29.1.291

    PubMed  Google Scholar 

  • Maurer-Stroh S, Eisenhaber F (2004) Myristoylation of viral and bacterial proteins. Trends Microbiol 12:178–185. doi:10.1016/j.tim.2004.02.006

    PubMed  CAS  Google Scholar 

  • Maurer-Stroh S, Eisenhaber F (2005) Refinement and prediction of protein prenylation motifs. Genome Biol 6:R55. doi:10.1186/gb-2005-6-6-r55

    PubMed  Google Scholar 

  • Maurer-Stroh S, Eisenhaber B, Eisenhaber F (2002a) N-terminal N-myristoylation of proteins: prediction of substrate proteins from amino acid sequence. J Mol Biol 317:541–557. doi:10.1006/jmbi.2002.5426

    PubMed  CAS  Google Scholar 

  • Maurer-Stroh S, Eisenhaber B, Eisenhaber F (2002b) N-terminal N-myristoylation of proteins: refinement of the sequence motif and its taxon-specific differences. J Mol Biol 317:523–540. doi:10.1006/jmbi.2002.5425

    PubMed  CAS  Google Scholar 

  • Maurer-Stroh S, Gouda M, Novatchkova M et al (2004) MYRbase: analysis of genome-wide glycine myristoylation enlarges the functional spectrum of eukaryotic myristoylated proteins. Genome Biol 5:R21. doi:10.1186/gb-2004-5-3-r21

    PubMed  Google Scholar 

  • Maurer-Stroh S, Koranda M, Benetka W et al (2007) Towards complete sets of farnesylated and geranylgeranylated proteins. PLoS Comput Biol 3:e66. doi:10.1371/journal.pcbi.0030066

    PubMed  Google Scholar 

  • Maurer-Stroh S, Ma J, Lee RTC, et al. (2009) Mapping the sequence mutations of the 2009 H1N1 influenza A virus neuraminidase relative to drug and antibody binding sites. Biol Direct 4:18; discussion 18. doi:10.1186/1745-6150-4-18

  • Menne KM, Hermjakob H, Apweiler R (2000) Bioinformatics 16:741–742

    PubMed  CAS  Google Scholar 

  • Mihalek I, Res I, Lichtarge O (2004) A family of evolution-entropy hybrid methods for ranking protein residues by importance. J Mol Biol 336:1265–1282. doi:10.1016/j.jmb.2003.12.078

    PubMed  CAS  Google Scholar 

  • Monteil A, Chemin J, Bourinet E et al (2000a) Molecular and functional properties of the human alpha(1 G) subunit that forms T-type calcium channels. J Biol Chem 275:6090–6100

    PubMed  CAS  Google Scholar 

  • Monteil A, Chemin J, Leuranguer V et al (2000b) Specific properties of T-type calcium channels generated by the human alpha 1I subunit. J Biol Chem 275:16530–16535. doi:10.1074/jbc.C000090200

    PubMed  CAS  Google Scholar 

  • Mott R (2000) Accurate formula for P-values of gapped local sequence and profile alignments. J Mol Biol 300:649–659. doi:10.1006/jmbi.2000.3875

    PubMed  CAS  Google Scholar 

  • Mungall CJ, Misra S, Berman BP et al (2002) An integrated computational pipeline and database to support whole-genome sequence annotation. Genome Biol 3:RESEARCH0081

    Google Scholar 

  • Neuberger G, Maurer-Stroh S, Eisenhaber B et al (2003a) Prediction of peroxisomal targeting signal 1 containing proteins from amino acid sequence. J Mol Biol 328:581–592

    PubMed  CAS  Google Scholar 

  • Neuberger G, Maurer-Stroh S, Eisenhaber B et al (2003b) Motif refinement of the peroxisomal targeting signal 1 and evaluation of taxon-specific differences. J Mol Biol 328:567–579

    PubMed  CAS  Google Scholar 

  • Nielsen H, Krogh A (1998) Prediction of signal peptides and signal anchors by a hidden Markov model. Proc Int Conf Intell Syst Mol Biol 6:122–130

    PubMed  CAS  Google Scholar 

  • Nielsen H, Engelbrecht J, Brunak S, von Heijne G (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10:1–6

    PubMed  CAS  Google Scholar 

  • Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217. doi:10.1006/jmbi.2000.4042

    PubMed  CAS  Google Scholar 

  • Novatchkova M, Schneider G, Fritz R et al (2006) DOUTfinder—identification of distant domain outliers using subsignificant sequence similarity. Nucleic Acids Res 34:W214–W218. doi:10.1093/nar/gkl332

    PubMed  CAS  Google Scholar 

  • Okita K, Kiyonari H, Nobuhisa I et al (2004) Targeted disruption of the mouse ELYS gene results in embryonic death at peri-implantation development. Genes Cells 9:1083–1091. doi:10.1111/j.1365-2443.2004.00791.x

    PubMed  CAS  Google Scholar 

  • Ooi HS, Kwo CY, Wildpaner M et al (2009) ANNIE: integrated de novo protein sequence annotation. Nucleic Acids Res 37:W435–W440. doi:10.1093/nar/gkp254

    PubMed  CAS  Google Scholar 

  • Ooi HS, Schneider G, Chan Y-L et al (2010a) Databases of protein-protein interactions and complexes. Methods Mol Biol 609:145–159. doi:10.1007/978-1-60327-241-4_9

    PubMed  CAS  Google Scholar 

  • Ooi HS, Schneider G, Lim T-T et al (2010b) Biomolecular pathway databases. Methods Mol Biol 609:129–144. doi:10.1007/978-1-60327-241-4_8

    PubMed  CAS  Google Scholar 

  • Orlicky S, Tang X, Willems A et al (2003) Structural basis for phosphodependent substrate selection and orientation by the SCFCdc4 ubiquitin ligase. Cell 112:243–256

    PubMed  CAS  Google Scholar 

  • Palczewski K, Kumasaka T, Hori T et al (2000) Crystal structure of rhodopsin: a G protein-coupled receptor. Science 289:739–745. doi:10.1126/science.289.5480.739

    PubMed  CAS  Google Scholar 

  • Park J, Karplus K, Barrett C et al (1998) Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol 284:1201–1210. doi:10.1006/jmbi.1998.2221

    PubMed  CAS  Google Scholar 

  • Pearson WR (1998) Empirical statistical estimates for sequence similarity searches. J Mol Biol 276:71–84. doi:10.1006/jmbi.1997.1525

    PubMed  CAS  Google Scholar 

  • Pearson WR (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol 132:185–219

    PubMed  CAS  Google Scholar 

  • Peña-Castillo L, Hughes TR (2007) Why are there still over 1000 uncharacterized yeast genes? Genetics 176:7–14. doi:10.1534/genetics.107.074468

    PubMed  Google Scholar 

  • Pons T, Gómez R, Chinea G, Valencia A (2003) Beta-propellers: associated functions and their role in human diseases. Curr Med Chem 10:505–524

    PubMed  CAS  Google Scholar 

  • Promponas VJ, Enright AJ, Tsoka S et al (2000) CAST: an iterative algorithm for the complexity analysis of sequence tracts. Bioinformatics 16:915–922. doi:10.1093/bioinformatics/16.10.915

    PubMed  CAS  Google Scholar 

  • Puntervoll P, Linding R, Gemünd C et al (2003) ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res 31:3625–3630

    PubMed  CAS  Google Scholar 

  • Pupko T, Bell RE, Mayrose I et al (2002) Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18:S71

    PubMed  Google Scholar 

  • Rasala BA, Orjalo AV, Shen Z et al (2006) ELYS is a dual nucleoporin/kinetochore protein required for nuclear pore assembly and proper cell division. Proc Natl Acad Sci USA 103:17801–17806. doi:10.1073/pnas.0608484103

    PubMed  CAS  Google Scholar 

  • Rasala BA, Ramos C, Harel A, Forbes DJ (2008) Capture of AT-rich chromatin by ELYS recruits POM121 and NDC1 to initiate nuclear pore assembly. Mol Biol Cell 19:3982–3996. doi:10.1091/mbc.E08-01-0012

    PubMed  CAS  Google Scholar 

  • Raymond CS (2000) High-throughput protein crystallization. Curr Opin Struct Biol 10:558–563. doi:10.1016/S0959-440X(00)00131-7

    Google Scholar 

  • Roth AC, Gonnet GH, Dessimoz C (2008) Algorithm of OMA for large-scale orthology inference. BMC Bioinformatics 9:518. doi:10.1186/1471-2105-9-518

    PubMed  Google Scholar 

  • Sali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234:779–815. doi:10.1006/jmbi.1993.1626

    PubMed  CAS  Google Scholar 

  • Schäffer AA, Wolf YI, Ponting CP et al (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15:1000–1011

    PubMed  Google Scholar 

  • Schneider G, Neuberger G, Wildpaner M et al (2006) Application of a sensitive collection heuristic for very large protein families: evolutionary relationship between adipose triglyceride lipase (ATGL) and classic mammalian lipases. BMC Bioinformatics 7:164. doi:10.1186/1471-2105-7-164

    PubMed  Google Scholar 

  • Schneider G, Wildpaner M, Sirota FL et al (2010) Integrated tools for biomolecular sequence-based function prediction as exemplified by the ANNOTATOR software environment. Methods Mol Biol 609:257–267. doi:10.1007/978-1-60327-241-4_15

    PubMed  CAS  Google Scholar 

  • Sharon I, Birkland A, Chang K et al (2005) Correcting BLAST e-values for low-complexity segments. J Comput Biol 12:980–1003. doi:10.1089/cmb.2005.12.980

    PubMed  CAS  Google Scholar 

  • Sigrist CJA, Cerutti L, Hulo N et al (2002) PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinformatics 3:265–274

    PubMed  CAS  Google Scholar 

  • Sirota FL, Ooi H-S, Gattermayer T et al (2010) Parameterization of disorder predictors for large-scale applications requiring high specificity by using an extended benchmark dataset. BMC Genomics 11:S15. doi:10.1186/1471-2164-11-S1-S15

    PubMed  Google Scholar 

  • Söding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21:951–960. doi:10.1093/bioinformatics/bti125

    PubMed  Google Scholar 

  • Söding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33:W244–W248. doi:10.1093/nar/gki408

    PubMed  Google Scholar 

  • Sonnhammer EL, von Heijne G, Krogh A (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol 6:175–182

    PubMed  CAS  Google Scholar 

  • Suzek BE, Huang H, McGarvey P et al (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23:1282

    PubMed  CAS  Google Scholar 

  • Tan J, Kuchibhatla D, Sirota FL, Sherman WA, Gattermayer T, Kwoh CY, Eisenhaber F, Schneider G, Maurer-Stroh S (2012) Tachyon search speeds up retrieval of similar sequences by several orders of magnitude. Bioinformatics 28:1645–1646

    PubMed  CAS  Google Scholar 

  • Tusnády GE, Simon I (1998) Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J Mol Biol 283:489–506. doi:10.1006/jmbi.1998.2107

    PubMed  Google Scholar 

  • Van Dongen S (2008) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30:121. doi:10.1137/040608635

    Google Scholar 

  • von Heijne G (1986) A new method for predicting signal sequence cleavage sites. Nucleic Acids Res 14:4683–4690. doi:10.1093/nar/14.11.4683

    Google Scholar 

  • von Heijne G (1992) Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol 225:487–494

    Google Scholar 

  • Wallin E, von Heijne G (1998) Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci 7:1029–1038. doi:10.1002/pro.5560070420

    PubMed  CAS  Google Scholar 

  • Ward JJ, Sodhi JS, McGuffin LJ et al (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol 337:635–645. doi:10.1016/j.jmb.2004.02.002

    PubMed  CAS  Google Scholar 

  • Warne T, Serrano-Vega MJ, Baker JG et al (2008) Structure of a [bgr]1-adrenergic G-protein-coupled receptor. Nature 454:486–491. doi:10.1038/nature07101

    PubMed  CAS  Google Scholar 

  • Waterhouse AM, Procter JB, Martin DMA et al (2009) Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics 25:1189–1191. doi:10.1093/bioinformatics/btp033

    PubMed  CAS  Google Scholar 

  • Whittle JRR, Schwartz TU (2009) Architectural nucleoporins Nup157/170 and Nup133 are structurally related and descend from a second ancestral element. J Biol Chem 284:28442–28452. doi:10.1074/jbc.M109.023580

    PubMed  CAS  Google Scholar 

  • Wolf YI, Brenner SE, Bash PA, Koonin EV (1999) Distribution of protein folds in the three superkingdoms of life. Genome Res 9:17–26

    PubMed  CAS  Google Scholar 

  • Wong W-C, Maurer-Stroh S, Eisenhaber F (2010) More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology. PLoS Comput Biol 6:e1000867. doi:10.1371/journal.pcbi.1000867

    PubMed  Google Scholar 

  • Wong W-C, Maurer-Stroh S, Eisenhaber F (2011a) The Janus-faced E-values of HMMER2: extreme value distribution or logistic function? J Bioinform Comput Biol 9:179–206

    PubMed  Google Scholar 

  • Wong W-C, Maurer-Stroh S, Eisenhaber F (2011b) Not all transmembrane helices are born equal: towards the extension of the sequence homology concept to membrane proteins. Biol Direct 6:57. doi:10.1186/1745-6150-6-57

    PubMed  CAS  Google Scholar 

  • Wootton JC (1994a) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem 18:269–285

    PubMed  CAS  Google Scholar 

  • Wootton JC (1994b) Sequences with “unusual” amino acid compositions. Curr Opin Struct Biol 4:413–421. doi:10.1016/S0959-440X(94)90111-2

    CAS  Google Scholar 

  • Wootton JC, Federhen S (1993) Statistics of local complexity in amino acid sequences and sequence databases. Comput Chem 17:149–163. doi:10.1016/0097-8485(93)85006-X

    CAS  Google Scholar 

  • Wootton JC, Federhen S (1996) Analysis of compositionally biased regions in sequence databases. Methods in Enzymology 266:554–571

    Google Scholar 

  • Xenarios I, Salwínski L, Duan XJ et al (2002) DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30:303–305

    PubMed  CAS  Google Scholar 

  • Yoshida M, Muneyuki E, Hisabori T (2001) ATP synthase—a marvellous rotary engine of the cell. Nat Rev Mol Cell Biol 2:669–677. doi:10.1038/35089509

    PubMed  CAS  Google Scholar 

  • Zanzoni A, Montecchi-Palazzi L, Quondam M et al (2002) MINT: a molecular INTeraction database. FEBS Lett 513:135–140

    PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Eisenhaber .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Wien

About this chapter

Cite this chapter

Schneider, G. et al. (2012). Protein Sequence–Structure–Function–Network Links Discovered with the ANNOTATOR Software Suite: Application to ELYS/Mel-28. In: Trajanoski, Z. (eds) Computational Medicine. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0947-2_7

Download citation

Publish with us

Policies and ethics