doi:10.1016/j.compbiolchem.2004.07.002
Copyright © 2004 Elsevier Ltd All rights reserved.
A hidden Markov model with molecular mechanics energy-scoring function for transmembrane helix prediction
aDepartment of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA
bAccelrys Inc., 9685 Scranton Road, San Diego, CA 92121, USA
cMeTa Informatics, 12987 Caminito Bautizo, San Diego, CA 92130, USA
Received 4 May 2004;
revised 7 July 2004;
accepted 7 July 2004.
Available online 11 September 2004.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
A range of methods has been developed to predict transmembrane helices and their topologies. Although most of these algorithms give good predictions, no single method consistently outperforms the others. However, combining different algorithms is one approach that can potentially improve the accuracy of the prediction. We developed a new method that initially uses a hidden Markov model to predict alternative models for membrane spanning helices in proteins. The algorithm subsequently identifies the best among models by ranking them using a novel scoring function based on the folding energy of transmembrane helical fragments. This folding of helical fragments and the incorporation into membrane is modeled using CHARMm, extended with the Generalized Born surface area solvent model (GBSA/IM) with implicit membrane. The combined method reported here, TMHGB significantly increases the accuracy of the original hidden Markov model-based algorithm.
Keywords: Transmembrane protein topology; Hidden Markov model; Topology prediction; Folding energy; GPCR
Fig. 1. Distribution of offsets of transmembrane helix boundaries predicted by TMHMM and TransMem.
Fig. 2. Folding energy of model transmembrane helical fragments (Gf, Eq. (1)) computed as a function of the position of central residue along the bacteriorhodopsin sequence.
Fig. 3. Effect of number of models predicted by TransMem on transmembrane helix prediction. All the predictions used no energy offset (E0 = 0). A total of 188 proteins with 870 transmembrane helices are used. (A) Number of true positives predicted by combined algorithm. (B) and (C) Increases number of models predicted by combined algorithm increases false positives and error rate (FP + FN).
Fig. 4. Effect of constant term E0 on transmembrane helix predict by combined algorithm. Top 2 model:
. Top 10 model: ■. Top 20 model:
. (A) Increasing E0 decreases number of true positives predicted by the algorithm. (B) and (C) Increasing E0 decreases false positives and error rate (false positives + false negatives) Top 2 through top 20 models are studied but only top 2, top 10 and top 20 model data are shown. A total of 188 proteins with 870 transmembrane helices are used for results.
Fig. 5. Number of helices predicted by TransMem for GPCR proteins from SwissProt. Majority of proteins are predicted to have 6, 7 or 8 transmembrane helices.
Table 1.
Prediction of transmembrane helices by TMHMM and TransMem

The 188 protein set (Möller et al., 2001a) with a total of 870 experimentally determined transmembrane helices are used. TransMemA: prediction by TransMem using Viterbi algorithm. TransMemB: TransMem is used to generate the top 10 models for each protein. The model with the best match with the experimental data is selected from this set.
Table 2.
Transmembrane helices predicted by TMHGB

TP: true positives. FP: false positives. FN: false negatives. A set of 150 proteins from Möller et al., 2000 with 741 TMH was used for prediction. The original 188 protein set from Möller et al., 2000 gives the same results (Data not shown).
Table 3.
Performance of TMHGB on high-resolution data with structure information (Zhou and Zhou, 2003)

A total of 73 proteins with 246 helices are used for prediction. Percentages given in parenthesis were calculated using the number of observations divided by total number of proteins (=73) or total number of helices (=246).
Table 4.
Prediction of GPCR transmembrane helices by TMHGB
