Exploring Mouse Protein Function via Multiple Approaches

Guohua Huang; Chen Chu; Tao Huang; Xiangyin Kong; Yunhua Zhang; Ning Zhang; Yu-Dong Cai

doi:10.1371/journal.pone.0166580

Abstract

Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1^st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1^st-order predicted functions are wrong but the 2^nd-order predicted functions are correct, the 1^st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.

Citation: Huang G, Chu C, Huang T, Kong X, Zhang Y, Zhang N, et al. (2016) Exploring Mouse Protein Function via Multiple Approaches. PLoS ONE 11(11): e0166580. https://doi.org/10.1371/journal.pone.0166580

Editor: Alexey Porollo, Cincinnati Children's Hospital Medical Center, UNITED STATES

Received: August 14, 2016; Accepted: October 31, 2016; Published: November 15, 2016

Copyright: © 2016 Huang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: Funded by National Natural Science Foundation of China (61401302, 31371335, 81171342, 81201148), the Tianjin Research Program of the Application Foundation and Advanced Technology (14JCQNJC09500), the Innovation Program of the Shanghai Municipal Education Commission (12ZZ087), the National Research Foundation for the Doctoral Program of Higher Education of China (20130032120070, 20120032120073), the Scientific Research Fund of Hunan Provincial Education Department (15B216), the Science and Technology Program of Hunan (2015JC3099), and the Seed Foundation of Tianjin University (60302064, 60302069).

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Recent advances in sequencing technology have identified a large number of proteins that perform a wide variety of functions in cellular activities. Knowledge of protein function is crucial to understanding the mechanisms behind cellular processes and preventing and treating disease. However, most of the proteins identified to date have unknown functions. Approximately 1% of the more than 13 million protein sequences available have been experimentally annotated with essential functions; the remaining proteins have been marked with putative, uncharacterized, hypothetical, unknown or inferred functions [1]. Although physical experimental approaches, including high-throughput screening, are capable of determining the biological functions of proteins, they are expensive and time-consuming. Additionally, these methods are aimed at certain functions, which produce one-sided descriptions of protein function [2].

Computational approaches can make up for the deficiencies of experiments. Following the success of the computational approach in sequence alignment and comparison, many computational techniques have been presented to determine protein functions during the last decade [3]. The most commonly applied approach is to transfer functional annotation from the most similar protein with known functional information. Both sequence and structural similarities are heavily utilized in this type of homology-based annotation transfer. To infer protein function, the servers OntoBlast [4] and GoFigure [5] use the sequence alignment tool BLAST [6]. Confunc [7], the protein function prediction (PFP) algorithm [8] and the extended similarity group method (ESG) [9] employ the sequence alignment tool PSI_BLAST [10]. The Blast2GO suite is the homology transfer-based functional annotation of the gene ontology vocabulary [11]. Similar to the sequence similarity-based method, the structure similarity-based approach generally uses structure alignments via programs such as DaliLite [12–14], STRUCTAL [15], MultiProt [16], Bioinfo3D [17], and 3DCoffee [18] to measure homology among proteins. PHUNCTIONER [19] utilizes structural alignment to identify crucial positions in a protein that might hold clues to specific functions. Pegg et al. [20] constructed a structure-function link database and used it to correct the errors in the annotation of enzymes. Some researchers have attempted to combine the sequence and structure similarity approaches to explore protein function. For example, the FRalanyzer [21] uses sequence-structure alignments to elucidate protein function.

Recently, a large body of protein-protein interaction networks has become available to explore the functional relationships between interacting proteins. There are many computational models for predicting protein-protein interactions [22–24]. The commonly accepted hypothesis (called guilt-by-association (GBA) [25]) is that proteins are more likely to share identical or similar functions with interacting proteins than with non-interacting proteins. Since Schwikowski et al. [26] pioneered the utility of interaction networks for annotating protein functions in yeast, numerous interaction-based methods have been proposed to infer the functions of proteins. Hishigaki et al. [27] presented an improved predictive method called the Chi-square method to elucidate protein function. Chi et al. [28] used an iterative strategy to transfer neighboring protein functions. Chua et al. [29] extended the neighborhood to indirect neighbors, called 2-neighbors. These types of local network predictions mainly transferred functional annotation from the directly interacting neighborhood. Additionally, some global optimization techniques have been adopted to elucidate protein function. For example, Deng et al. [30], Letovsky et al. [31] and Kourmpetis et al. [32] used the Bayesian Markov random field method to infer protein functions from protein-protein interaction data and functional annotation of the protein interaction partners. The protein-protein interaction network is viewed as a graph, where the nodes represent proteins and the edges represent the interactions between proteins. Some graph-based methods have been presented for function predictions. Nabieva et al. [33] modeled the functional annotation from the interaction network as a minimum multiway cut problem and introduced a network-flow algorithm that simulated the functional flow between proteins. The clustering-based and network alignment-based techniques have been employed to predict protein functions. Altaf-Ul-Amin et al. [34] and Arnau et al. [35] used different clustering techniques to classify protein functions, whereas Singh et al. [36] presented a global alignment of multiple protein interaction networks to infer protein functions. These approaches outperformed sequence similarity and local alignment of networks. Some researchers have presented a routine to predict protein function by combining multiple methods and data sources. For instance, Cozzetto et al. [2] integrated PSI-BLAST, text-mining, machine learning, and profile-profile comparisons to predict protein functions. As these authors noted, although considerable progress has been made, the functional annotation of integrative methods can be improved. Most of the above-mentioned networks are binary (i.e., 1 indicates interaction and 0 indicates no interaction). Additionally, the interaction between proteins can be strong or weak. The STRING database [37] is a protein interaction repository that characterizes each interaction into a weight value based on eight different lines of evidence. Hu et al. [38] used a weighted interaction to predict protein function and achieved a promising performance.

Great progress has been made in the computational protein function prediction field, where state-of-the-art prediction algorithms substantially outperform first-generation methods and contribute to subsequent experimental studies. However, there still remains considerable need for the improvement of the current tools [39]. To this end, we presented an integrated method to explore mouse protein functions by fusing sequence similarities, weighted interactions from the STRING database and the pseudo amino acid composition of proteins. Unannotated proteins were aligned against a database consisting of proteins with known functions. If the query protein was homologous to well-annotated proteins, the alignment scores were used to infer function. If there were no known homologous proteins, we extracted weighted interactions from the STRING database and used them to predict the query protein function. For proteins whose functions the previous two approaches could not predict, we used the pseudo amino acid composition (PseAAC)-based nearest-neighbor approach to elucidate their function.

2 Data and Methods

2.1 Data

A total of 14,732 mouse protein sequences with their functional annotations were downloaded from the Mouse Functional Genome Database (MfunGD, http://mips.gsf.de/genre/proj/mfungd/) [40], which is an important repository of protein sequences that provides high-quality protein function annotations with respect to cellular function exclusively for mice. To extensively examine the model for independency of homology, we used the sequence cluster program CD-HIT [41] to remove or reduce similarities between sequences. We obtained 12,478 proteins with a similarity threshold of 0.7. The mouse proteins in the MfunGD are annotated using the Functional Catalogue (FunCat) annotation scheme, which is widely used for the analysis of protein networks [42]. Compared with the GO categories, the FunCat category structure is simpler and more hierarchical.

As shown in Table 1, there are a total of 24 functional categories. The balance between the specificity of the categories, human usability and requirements for subsequent bioinformatic applications is a general consideration in the design of an annotation scheme [42]. In line with this notion, the 24-category-scheme for protein function classification is not performed at the most specific level, but it keeps our system descriptive and compact, which complies with the main goal of our study. The fact that the functions outnumber the proteins indicates that some proteins perform multiple functions. For details, see S1 Table.

Download:

Table 1. The number of mouse proteins in each category in our dataset.

https://doi.org/10.1371/journal.pone.0166580.t001

Protein-protein interaction pairs in mice were retrieved from STRING (Version 9.1, http://string-db.org/) [37], which is a protein-protein interaction database that collects known or predicted, direct (physical) or indirect (functional) associations. STRING quantifies each pair of protein interactions into a combined score. Currently, STRING contains 5,214,234 proteins from 1,133 organisms.

Because the manner in which the entries in the MfunGD are numbered differs from the method in STRING, comparison requires the mapping of associations between them. The mapping was performed using the BioMart database [43]. A total of 10,539 of the 12,478 proteins in MfunGD were mapped to the proteins in STRING.

2.2 Methods

The aim of this study is to predict the function of a given protein P based on n known-function proteins P₁, P₂, …, P_n, assuming that the function categories are f₁, f₂, …, f₂₄. One protein may belong to several function categories (e.g., the protein mc10000007 belongs to categories f₈ ‘REGULATION OF METABOLISM AND PROTEIN FUNCTION’ and f₁₀ ‘CELLULAR COMMUNICATION/SIGNAL TRANSDUCTION MECHANISM’). Thus, we used a 24-dimensional vector F_i = (d_1,i, d_2,i, …, d_24,i) to indicate the function categories of a protein P_i, where d_j,i is (1) Three methods were used in this study to achieve this goal.

2.2.1 Sequence similarity-based approach

Proteins with similar sequences likely share the same or similar functions. Therefore, it is possible to predict protein functions based on sequence similarities. Herein, we used the PSI-BLAST program (E-value 0.001, iteration 3) to align the given unknown-function protein (P) against the known-function proteins (P₁, P₂, …, P_n) in our dataset. The alignment score between P and P_i represents their similarity. This score is denoted as s_i. The predicted protein function scores of protein P are given by a 24-dimensional vector W and are calculated by (2) where w_j denotes the score of a protein having function f_j. Elements in vector W are sorted from highest to lowest to obtain the predicted functions of protein P. A function receiving a high score is more likely to be an actual function of a given protein according to GBA [44] because there are several known proteins similar to the given protein that have this function. Thus, a function sequence can be constructed according to W. We provide an example to elaborate this point. If we obtain w₂₃ ≥ w₂ ≥…≥ w₅, protein P is most likely to have function f₂₃, followed by function f₂ and so forth. The least likely function is f₅. For convenience, we call function f₂₃ the 1^st-order prediction, function f₂ the 2^nd-order prediction and function f₅ the 24^th-order prediction. This scheme to define the predicted results for multi-label classification problems has been used in previous studies [38, 45, 46].

2.2.2 Weighted interaction-based approach.

Proteins in a cell interact with each other to perform particular functions. Following the GBA rule [25], interacting proteins may possess similar functions. We used the combined scores in the STRING database as weighted values between proteins. These values represent a fusion of eight types of evidence, including co-expression, gene fusion and experimental evidence. We assume that the combined score between P and P_i (i = 1,2,⋯,n) is t_i. The predictive functional value is given by Y, a 24-dimensional vector computed by (3) where y_j denotes the score of a given protein with function f_j. Similar to the sequence similarity-based approach, each element y_j in vector Y is sorted from highest to lowest to obtain the function sequence of protein P. For example, if we obtain y₁₂ ≥ y₂₁ ≥…≥ y₂, protein P is most likely to have function f₁₂, followed by f₂₁, and the least likely function is f₂. In this study, we call function f₁₂ the 1^st-order prediction, function f₂₁ the 2^nd-order prediction, and function f₂ the 24^th-order prediction.

2.2.3 PseAAC-based approach.

Protein sequences can be characterized by pseudo amino acid composition, which was proposed by Chou to predict protein subcellular localization [47] and has become popular in the prediction of post-translational modification sites [48, 49] and membrane protein types [50–52]. PseAAC maps a protein sequence into a numerical vector. If a protein sequence is X₁X₂⋯X_N, where X_i is an amino acid residue, then L(X) is the property value of amino acid X in the physicochemical and biochemical respects. The normalized property value is computed by (4) where Φ is the set of 20 types of amino acids. The correlation factor between residues in the protein sequence is computed by (5) The correlation factors reflect information about the position and category of amino acids in the protein sequence. The PseAAC of a protein sequence is computed by (6) where ϖ is the sequence order effects, and f_i is the occurrence frequency of amino acids. In this article, we set λ and ϖ to 50 and 0.15, respectively. Five physicochemical and biochemical properties of amino acids, i.e., codon diversity, electrostatic charge, molecular volume, polarity and secondary structure, are used to compute the PseAAC of protein sequences. These properties are retrieved from references [53–55], as listed in Table 2. For each category of property, we used the last 50 digits in the formula (6). In addition to the frequencies of 20 amino acids, we used a 270 (20+5*50)-dimensional vector to represent a protein sequence. The cosine distance between the query protein P and the known-function proteins P_i is given by (7) where the operators • and ‖ ‖ indicate the inner product and module of vectors, respectively, and V_P and are the 270-dimensional PseAACs of proteins P and P_i, respectively. The predicted function value of the query protein was computed by (8) Similar to the two above approaches, the elements in the vector R are sorted from high to low, such as r₃ > r₁₂ > ⋯ > r₁, where protein P is most likely to have function f₃, second most likely to have function f₁₂, and least likely to have function f₁.

Download:

Table 2. The physicochemical and biochemical properties of the 20 amino acids.

https://doi.org/10.1371/journal.pone.0166580.t002

2.3 Cross-Validation and Assessment

We used two cross-validation methods: leave-one-out cross-validation and ten-fold cross-validation to examine the performance of the presented methods. In the ten-fold cross-validation method, the original dataset are randomly and equally divided into ten parts. Samples in each part are singled out as testing samples, while samples in other nine parts are used as training samples. For the leave-one-out cross-validation approach, each sample in the original dataset is taken as a testing sample in turn and the remaining samples are used as training samples. To assess the experimental results, the prediction accuracy for the j^th-order prediction is given by (9) where U_j,i = 1 if the function category of the j^th-order prediction is actually the function category of protein P according to current knowledge. Otherwise, U_j,i = 0.

3 Results and Discussion

3.1 Performance of the Simple Approach

The performance of the three approaches for a dataset consisting of 12,478 proteins evaluated by the leave-one-out method is listed in Table 3. The similarity-based approach yielded the best accuracy of 0.8756 in the 1^st-order prediction but could not predict functions of 2,226 proteins because they have no homologues with annotated proteins in the dataset. The interaction-based approach produced a lower prediction accuracy of 0.7535 than the similarity-based approach and could not predict the functions of 1,939 proteins that have no interactions with annotated proteins. The PseAAC-based nearest-neighbor approach performed worst in terms of the prediction accuracy, but it was able to predict the functions of all the test proteins. The results indicated that each approach has its strengths and limitations. Table 3 also shows that the 1^st-order prediction performed best, followed by the 2^nd-order prediction and the 3^rd-order prediction, indicating the predicted function sequence for each test protein is quite reasonable.

Download:

Table 3. Prediction accuracies of three methods and the combined method in the first three order predictions.

https://doi.org/10.1371/journal.pone.0166580.t003

The three approaches were compared on different testing datasets in the above paragraph. For a fair comparison, we generated a common dataset where each protein could be tested by the leave-one-out method. The common dataset consisted of 8,481 proteins. The accuracies of the three approaches versus order are plotted in Fig 1. The similarity-based approach performed best, followed by the interaction-based approach and the PseAAC-based nearest-neighbor approach. The similarity-based approach was much more accurate (by 0.11) than the interaction-based approach in the 1^st-order prediction and more accurate (by 0.07) in the 2^nd-order prediction, while the latter was much more accurate (by 0.09) than the PseAAC-based approach in the 1^st-order prediction and more accurate (by 0.02) in the 2^nd-order prediction. The results confirmed the advantage of the similarity-based approach over the other two approaches in terms of the prediction accuracy. As mentioned previously, the similarity-based approach cannot address non-homologous proteins, and the PseAAC-based approach can predict the functions off all proteins despite the lower prediction accuracy. Therefore, it is wise to jointly utilize the three methods to predict the protein functions.

Download:

Fig 1. The prediction accuracies of 24 order predictions for these three methods on the common dataset.

https://doi.org/10.1371/journal.pone.0166580.g001

3.2 Prediction by the Combined Approach

We combined the three approaches to predict the functions of proteins to make use of their respective advantages and disadvantages. For a given protein, we first employed the similarity-based approach. If the protein had no homologues, we applied the interaction-based approach. If the protein could not be predicted by the interaction-based approach, we used the PseAAC-based nearest-neighbor approach. The performance of the combined approach based on leave-one-out validation on the 12,478 proteins is shown in the fifth row of Table 3. The accuracy of the combined method was much higher than the interaction-based and PseAAC-based approaches and slightly lower than the similarity-based approach. However, the combined approach could predict all proteins, whereas the similarity-based approach could not. Therefore, the combined method has wide application at the cost of reduced prediction accuracy. For proteins with no homologues or interactions with annotated proteins, the best alternative is to use the combined approach. The contributions of the three approaches to the final predictive performance are shown in Table 4. The similarity-based approach contributed most, predicting more than 80% of all proteins and yielding an Acc₁ of 0.8756, followed by the interaction-based approach and the PseAAC-based approach.

Download:

Table 4. Contributions of the three approaches to the predicted results.

https://doi.org/10.1371/journal.pone.0166580.t004

To fully indicate the effectiveness of the combined method, we also used ten-fold cross-validation to examine this method. Because the predicted results yielded by this cross-validation method may influenced by the division of the dataset, the combined method was executed five times with different divisions. The prediction accuracies for the 1^st-order, 2^nd-order and 3^rd-order predictions in each time are listed in Table 5. Compared to the prediction accuracies yielded by the leave-one-out cross-validation that are listed in Table 3, the performances of these two cross-validation methods are almost at the same level, which indicates that the combined method is still quite effective when there are no close homologs are available. Furthermore, it can be observed from Table 5 that the standard deviations for the 1^st-order, 2^nd-order and 3^rd-order predictions are quite low, indicating the stability of the combined method.

Download:

Table 5. Performances of the combined method evaluated by ten-fold cross validation.

https://doi.org/10.1371/journal.pone.0166580.t005

3.3 Possible Protein Functions

In this study, the assessment of the predicted results was based on currently annotated proteins. Therefore, "right" and "wrong" predictions were relatively defined. For example, if the studied protein had function F_A and the predicted function was F_B, the prediction was not correct. It is conceivable that with the development of our knowledge, the protein could be found to possess F_B; thus, the prediction could be correct in the future. The currently annotated functions of the proteins are a subset of their actual functions. In this respect, some "wrong" predictions by our method in the current dataset may be correct. Next, we explore these wrong predictions.

It is worth performing further analysis on the wrongly predicted proteins. Because the 1^st-order prediction is the most important, we investigated proteins with "wrong" 1^st-order prediction but with "right" 2^nd-order prediction. Because these proteins might possess the predicted 1^st-order functions, we called them "false-wrong" 1^st-order predicted proteins. As mentioned above, the combined method was evaluated by both the leave-one-out and ten-fold cross-validations. Because the predicted results yielded by the ten-fold cross-validations are not unique, we selected the predicted results yielded by the leave-one-out cross-validation to further analyze wrongly predicted proteins. In the leave-one-out test on the 12,478 proteins, we identified 966 such proteins: 658 proteins from the similarity-based approach, 258 proteins from the interaction-based approach and 50 proteins from the PseAAC-based approach. All these proteins are listed in S2 Table.

The goal of this process was to further validate our method. If we found evidence indicating that any of these proteins possessed the "wrong-predicted" functions, the actual prediction accuracy of our method would be much higher than presented above. This would allow the method to be applied to new protein function discoveries, but further experimental validations may be required for these proteins.

3.4 Possible Function Analysis of Significant "False-Wrong" 1st-Order Predicted Proteins

We explored the functions of proteins whose predicted 1^st-order functions were wrong and whose predicted 2^nd-order functions were correct. There were 966 such proteins. Forty protein genes were closely related to "false-wrong" 1st-order predicted functions, of which sixteen were predicted by the similarity-based approach, twenty-two were predicted by the interaction-based approach, and two were predicted by the PseAAC-based approach, as listed in Table 6, Table 7 and Table 8, respectively.

Download:

Table 6. The sixteen significant proteins with "wrong" 1^st-order predictions but "right" 2^nd-order predictions based on the sequence similarity-based approach.

https://doi.org/10.1371/journal.pone.0166580.t006

Download:

Table 7. The twenty-two significant proteins with "wrong" 1^st-order predictions but "right" 2^nd-order predictions based on the weighted interaction-based approach.

https://doi.org/10.1371/journal.pone.0166580.t007

Download:

Table 8. The two significant proteins with "wrong" 1^st-order predictions but "right" 2^nd-order predictions based on the PseAAC-based approach.

https://doi.org/10.1371/journal.pone.0166580.t008

As shown in Table 6, sixteen significant proteins were predicted by the similarity-based approach. The proteins MYO1G, NEO1 and SDK1 were predicted to have the 1^st-order function ‘subcellular localization’, suggesting that these gene products have specific cellular localizations. MYO1G has been reported as a hematopoietic-specific myosin that localizes to the plasma membrane [56]. Moreover, neogenin 1 (NEO1) and sidekick cell adhesion molecule 1 (SDK1) are likely to localize on the plasma membrane based on their biological functions. The proteins PLG, GM711, MAPK15, PRKD2, STRADA, NTRK3, BMPR1A, KSR1, EPHB6 and KLK9 were predicted to have the 1^st-order function ‘protein fate (folding/modification/destination)’. MAPKs, BMP, KSR1, PRKD2, STRADA, NTRK3 and EPHB6 are responsible for protein phosphorylation and signal transduction [57–63]. KLK9 belongs to the family of kallikrein-related peptidases (KLKs), which possess trypsin-like proteolytic activity [64, 65]. Plasminogen (PLG) is a precursor of the key enzyme of the fibrinolytic system plasmin, which serves as a physiological backup enzyme for ADAMTS13 (a disintegrin and metalloproteinase with a thrombospondin type I motif, member 13) in the degradation of pathological platelet-VWF (Von Willebrand factor) complexes [66]. KRT2, PTK7 and SPEG were predicted to have the 1^st-order function ‘protein with binding function or cofactor requirement’. Protein tyrosine kinase 7 (PTK7) was reported to interact with the Wnt family proteins [67] and play a pivotal role in planar cell polarity [68]. The intermediate filament keratin proteins, including Keratin 2 (KRT2), bind and interact with signaling molecules, such as CFTR [69], trichoplein [70] and Albatross complexes [71]. SPEG complex locus (SPEG) is a myotubularin (MTM1)-binding protein, and its deficiency has been proven to cause centronuclear myopathy with dilated cardiomyopathy [72].

As shown in Table 7, twenty-two significant proteins were predicted by the interaction-based approach. ADRM1, ATP6V1F, AURKAIP1, BYSL, DHFR, DTYMK, GNE, HPD, HPS3, MAGOH, MED17, NUDC, PNO1, RGN, RPS25, SHCBP1 and SHFM1 were predicted to have the 1^st-order function ‘subcellular localization’. ATPase, H⁺ transporting, lysosomal 14 kDa, V1 subunit F (ATP6V1F) and adhesion regulating molecule 1 (ADRM1) are likely to localize on the plasma membrane based on their biological functions. Several gene products are specifically localized in the nucleus, including AURKAIP1, DHFR, MAGOH, MED17, NUDC, PNO1 and RGN. Among them, NUDC is a nuclear movement protein that interacts with dynein [73]. Mediator complex subunit 17 (MED17) is localized in the nucleus and is involved in transcription regulation [74, 75]. The Bystin-like (BYSL) protein was reported to colocalize with trophinin, tastin and cytokeratins in the cytoplasm, forming a complex in trophectoderm cells that is essential for embryo implantation and ribosomal biogenesis [76]. The ribosomal protein S25 (RPS25) is also located in the cytoplasm and is responsible for protein synthesis [77]. The protein 4-hydroxyphenylpyruvate dioxygenase (HPD) is enriched in the liver cell cytoplasm and encodes an enzyme involved in the catabolic pathway of tyrosine, which catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate [78]. SHFM1 (split hand/foot malformation (ectrodactyly) type 1, also known as DSS1) localizes to proteasomes [79]. Additionally, we predicted the specific subcellular localization of Hermansky-Pudlak syndrome 3 (HPS3), which encodes a novel protein with largely unknown function [80], together with aurora kinase A interacting protein 1 (AURKAIP1), deoxythymidylate kinase (DTYMK), glucosamine (UDP-N-acetyl)-2-epimerase/N-acetylmannosamine kinase (GNE), partner of NOB1 homologue (PNO1), dihydrofolate reductase (DHFR), and SHC SH2-domain binding protein 1 (SHCBP1). Our data provide clues for the future study of these genes. NCAPH, RIF1, CDCA5 and PRC1 were predicted to have the 1^st-order function ‘protein with binding function or cofactor requirement’. NCAPH (also known as CAP-H) binds to the chromosome and regulates the cell cycle [81]. CDCA5 (also known as SORORIN) binds to sister chromatids and regulates their separation [82]. Protein regulator of cytokinesis 1 (PRC1) was shown to bind to several motor proteins, including KIF4, MKLP1 and CENP-E, and play pivotal roles in the formation of microtubule architecture [83]. Replication timing regulatory factor (RIF1) is responsible for regulating the replication-timing program in mammalian cells [84]. It was shown to bind to aberrant telomeres and to align along the anaphase midzone microtubules [85]. NPFF was predicted to have the 1^st-order function ‘cellular communication’. NPFF (neuropeptide FF) is an FMRFamide-like peptide with antiopiate properties that is involved in cellular communication as a part of the neurotransmitter system [86, 87].

As shown in Table 8, two significant proteins were predicted by the PseAAC-based approach. A-kinase anchor protein 2 (AKAP2) has the known function of ‘protein fate (folding, modification, destination)’ as it regulates cyclic AMP-dependent protein kinase (PKA) signaling in both a spatial and temporal manner. The specific subcellular localization of AKAP2 is closely related to its function [88]. AKAP2 has both cytosolic and endosomal localizations, and a fraction of endosomal AKAP2 is involved in regulating the expression of several downstream proteins, such as Rab4 and Rab11, and endosomal functions [89]. As another example, kisspeptins (KISS1) have known functions related to ‘protein with binding function or cofactor requirement.’ The versatile and complex pathways of KISS1 and their receptors play essential roles in the development of the brain and the reproductive system [90] and induce apoptosis in various cancers [91, 92]. Previous publications have shed light on both the cytosolic and nuclear localization of KISS1 receptors, which were linked to distinct functions, such as cytosolic calcium elevation and potential nuclear transactivation activity [93, 94]. These lines of evidence support our prediction of the important ‘subcellular localization’ function of these proteins.

4. Conclusion

The accurate identification of protein functions remains challenging in the post-genomic era. In this article, we employed protein sequence homology, weighted interactions and pseudo amino acid composition to explore protein functions. The experimental results indicate that homologous proteins are more likely to share functions than interacting proteins, which in turn share more functions than proteins with similar physicochemical and biochemical properties. Weighted interactions can be used to annotate the functions of proteins with no known homologues. The PseAAC-based approach was used for the functional annotation of proteins. These three approaches are complementary and represent an optimal combination for predicting protein functions. Further analyses of wrongly predicted functions will validate the effectiveness of the proposed method.

Supporting Information

S1 Table. The dataset used in this study.

The first column of the file is the protein entry ID in MfunGD. The other columns are the functional categories to which the protein belongs.

https://doi.org/10.1371/journal.pone.0166580.s001

(CSV)

S2 Table. The "false-wrong" 1^st-order predicted proteins.

Proteins with "wrong" 1^st-order function predictions but "right" 2^nd-order function predictions in our dataset were called "false-wrong" 1^st-order predicted proteins. There were 658 such proteins based on the similarity-based approach, 258 based on the interaction-based approach and 50 based on the PseAAC-based approach. These proteins are listed on three separate sheets. The proteins may possess the function indicated by the 1st-order prediction and are worthwhile subjects for future analysis.

https://doi.org/10.1371/journal.pone.0166580.s002

(XLSX)

Acknowledgments

This work was supported by grants from the National Natural Science Foundation of China (61401302, 31371335, 81171342, 81201148), the Tianjin Research Program of the Application Foundation and Advanced Technology (14JCQNJC09500), the Innovation Program of the Shanghai Municipal Education Commission (12ZZ087), the National Research Foundation for the Doctoral Program of Higher Education of China (20130032120070, 20120032120073), the Scientific Research Fund of Hunan Provincial Education Department (15B216), the Science and Technology Program of Hunan (2015JC3099), and the Seed Foundation of Tianjin University (60302064, 60302069).

Author Contributions

Conceptualization: NZ YDC.
Data curation: GH CC TH.
Formal analysis: CC TH XK YZ.
Investigation: GH CC.
Methodology: GH YDC.
Resources: GH CC TH.
Supervision: YDC.
Validation: GH CC NZ.
Writing – original draft: GH CC.
Writing – review & editing: NZ YDC.

References

1. Erdin S, Lisewski AM, Lichtarge O. Protein function prediction: towards integration of similarity metrics. Current opinion in structural biology. 2011;21(2):180–8. pmid:21353529; PubMed Central PMCID: PMC3120633.
- View Article
- PubMed/NCBI
- Google Scholar
2. Cozzetto D, Buchan DW, Bryson K, Jones DT. Protein function prediction by massive integration of evolutionary analyses and multiple data sources. BMC Bioinformatics. 2013;14 Suppl 3:S1. Epub 2013/03/27. pmid:23514099; PubMed Central PMCID: PMCPmc3584902.
- View Article
- PubMed/NCBI
- Google Scholar
3. Pandey G, Kumar V, Steinbach M. Computational Approaches for Protein Function: A Review. 2006.
- View Article
- Google Scholar
4. Zehetner G. OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Research. 2003;31(13):3799–803. pmid:12824422
- View Article
- PubMed/NCBI
- Google Scholar
5. Khan S, Situ G, Decker K, Schmidt CJ. GoFigure: automated Gene Ontology annotation. Bioinformatics. 2003;19(18):2484–5. pmid:14668239.
- View Article
- PubMed/NCBI
- Google Scholar
6. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10. Epub 1990/10/05. pmid:2231712.
- View Article
- PubMed/NCBI
- Google Scholar
7. Wass MN, Sternberg MJ. ConFunc—functional annotation in the twilight zone. Bioinformatics. 2008;24(6):798–806. pmid:18263643.
- View Article
- PubMed/NCBI
- Google Scholar
8. Hawkins T, Chitale M, Luban S, Kihara D. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins. 2009;74(3):566–82. pmid:18655063.
- View Article
- PubMed/NCBI
- Google Scholar
9. Chitale M, Hawkins T, Park C, Kihara D. ESG: extended similarity group method for automated protein function prediction. Bioinformatics. 2009;25(14):1739–45. pmid:19435743; PubMed Central PMCID: PMC2705228.
- View Article
- PubMed/NCBI
- Google Scholar
10. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402. Epub 1997/09/01. pmid:9254694; PubMed Central PMCID: PMCPmc146917.
- View Article
- PubMed/NCBI
- Google Scholar
11. Gotz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008;36(10):3420–35. pmid:18445632; PubMed Central PMCID: PMC2425479.
- View Article
- PubMed/NCBI
- Google Scholar
12. Holm L, Park J. DaliLite workbench for protein structure comparison. Bioinformatics. 2000;16(6):566–7. pmid:10980157.
- View Article
- PubMed/NCBI
- Google Scholar
13. Holm L, Kaariainen S, Rosenstrom P, Schenkel A. Searching protein structure databases with DaliLite v.3. Bioinformatics. 2008;24(23):2780–1. pmid:18818215; PubMed Central PMCID: PMC2639270.
- View Article
- PubMed/NCBI
- Google Scholar
14. Holm L, Rosenstrom P. Dali server: conservation mapping in 3D. Nucleic Acids Res. 2010;38(Web Server issue):W545–9. pmid:20457744; PubMed Central PMCID: PMC2896194.
- View Article
- PubMed/NCBI
- Google Scholar
15. Kolodny R, Linial N. Approximate protein structural alignment in polynomial time. Proc Natl Acad Sci U S A. 2004;101(33):12201–6. pmid:15304646; PubMed Central PMCID: PMC514457.
- View Article
- PubMed/NCBI
- Google Scholar
16. Shatsky M, Nussinov R, Wolfson HJ. A method for simultaneous alignment of multiple protein structures. Proteins. 2004;56(1):143–56. pmid:15162494.
- View Article
- PubMed/NCBI
- Google Scholar
17. Shatsky M, Dror O, Schneidman-Duhovny D, Nussinov R, Wolfson HJ. BioInfo3D: a suite of tools for structural bioinformatics. Nucleic Acids Res. 2004;32(Web Server issue):W503–7. pmid:15215437; PubMed Central PMCID: PMC441551.
- View Article
- PubMed/NCBI
- Google Scholar
18. O'Sullivan O, Suhre K, Abergel C, Higgins DG, Notredame C. 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J Mol Biol. 2004;340(2):385–95. pmid:15201059.
- View Article
- PubMed/NCBI
- Google Scholar
19. Pazos F, Sternberg MJ. Automated prediction of protein function and detection of functional sites from structure. Proc Natl Acad Sci U S A. 2004;101(41):14754–9. pmid:15456910; PubMed Central PMCID: PMC522026.
- View Article
- PubMed/NCBI
- Google Scholar
20. Pegg SC, Brown SD, Ojha S, Seffernick J, Meng EC, Morris JH, et al. Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. Biochemistry. 2006;45(8):2545–55. pmid:16489747.
- View Article
- PubMed/NCBI
- Google Scholar
21. Saini HK, Fischer D. FRalanyzer: a tool for functional analysis of fold-recognition sequence-structure alignments. Nucleic Acids Res. 2007;35(Web Server issue):W499–502. pmid:17537819; PubMed Central PMCID: PMC1933221.
- View Article
- PubMed/NCBI
- Google Scholar
22. An JY, Meng FR, You ZH, Chen X, Yan GY, Hu JP. Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model. Protein science: a publication of the Protein Society. 2016;25(10):1825–33. pmid:27452983; PubMed Central PMCID: PMC5029537.
- View Article
- PubMed/NCBI
- Google Scholar
23. Huang YA, You ZH, Chen X, Chan K, Luo X. Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding. BMC Bioinformatics. 2016;17(1):184. pmid:27112932; PubMed Central PMCID: PMC4845433.
- View Article
- PubMed/NCBI
- Google Scholar
24. Wong L, You Z-H, Ming Z, Li J, Chen X, Huang Y-A. Detection of Interactions between Proteins through Rotation Forest and Local Phase Quantization Descriptors. International journal of molecular sciences. 2016;17(1):21. pmid:26712745
- View Article
- PubMed/NCBI
- Google Scholar
25. Oliver S. Proteomics: Guilt-by-association goes global. Nature. 2000;403(6770):601–3. pmid:10688178
- View Article
- PubMed/NCBI
- Google Scholar
26. Schwikowski B, Uetz P, Fields S. A network of protein-protein interactions in yeast. Nat Biotechnol. 2000;18(12):1257–61. Epub 2000/12/02. pmid:11101803.
- View Article
- PubMed/NCBI
- Google Scholar
27. Hishigaki H, Nakai K, Ono T, Tanigami A, Takagi T. Assessment of prediction accuracy of protein function from protein—protein interaction data. Yeast. 2001;18(6):523–31. Epub 2001/04/03. pmid:11284008.
- View Article
- PubMed/NCBI
- Google Scholar
28. Chi X, Hou J. An iterative approach of protein function prediction. BMC Bioinformatics. 2011;12:437. pmid:22074332; PubMed Central PMCID: PMC3224793.
- View Article
- PubMed/NCBI
- Google Scholar
29. Chua HN, Sung WK, Wong L. Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics. 2006;22(13):1623–30. pmid:16632496.
- View Article
- PubMed/NCBI
- Google Scholar
30. Deng M, Zhang K, Mehta S, Chen T, Sun F. Prediction of protein function using protein-protein interaction data. Journal of computational biology: a journal of computational molecular cell biology. 2003;10(6):947–60. Epub 2004/02/26. pmid:14980019.
- View Article
- PubMed/NCBI
- Google Scholar
31. Letovsky S, Kasif S. Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics. 2003;19 Suppl 1:i197–204. Epub 2003/07/12. pmid:12855458.
- View Article
- PubMed/NCBI
- Google Scholar
32. Kourmpetis YA, van Dijk AD, Bink MC, van Ham RC, ter Braak CJ. Bayesian Markov Random Field analysis for protein function prediction based on network data. PLoS One. 2010;5(2):e9293. Epub 2010/03/03. pmid:20195360; PubMed Central PMCID: PMC2827541.
- View Article
- PubMed/NCBI
- Google Scholar
33. Nabieva E, Jim K, Agarwal A, Chazelle B, Singh M. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics. 2005;21 Suppl 1:i302–10. Epub 2005/06/18. pmid:15961472.
- View Article
- PubMed/NCBI
- Google Scholar
34. Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S. Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics. 2006;7:207. pmid:16613608; PubMed Central PMCID: PMC1473204.
- View Article
- PubMed/NCBI
- Google Scholar
35. Arnau V, Mars S, Marin I. Iterative cluster analysis of protein interaction data. Bioinformatics. 2005;21(3):364–78. pmid:15374873.
- View Article
- PubMed/NCBI
- Google Scholar
36. Singh R, Xu J, Berger B. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc Natl Acad Sci U S A. 2008;105(35):12763–8. pmid:18725631; PubMed Central PMCID: PMC2522262.
- View Article
- PubMed/NCBI
- Google Scholar
37. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, et al. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Research. 2013;41(Database issue):D808–15. Epub 2012/12/04. pmid:23203871; PubMed Central PMCID: PMC3531103.
- View Article
- PubMed/NCBI
- Google Scholar
38. Hu LL, Huang T, Shi X, Lu WC, Cai YD, Chou KC. Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties. PLoS ONE. 2011;6(1):e14556. pmid:21283518
- View Article
- PubMed/NCBI
- Google Scholar
39. Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, et al. A large-scale evaluation of computational protein function prediction. Nature methods. 2013;10(3):221–7. Epub 2013/01/29. pmid:23353650; PubMed Central PMCID: PMC3584181.
- View Article
- PubMed/NCBI
- Google Scholar
40. Ruepp A, Doudieu ON, van den Oever J, Brauner B, Dunger-Kaltenbach I, Fobo G, et al. The Mouse Functional Genome Database (MfunGD): functional annotation of proteins in the light of their cellular context. Nucleic Acids Res. 2006;34(Database issue):D568–71. pmid:16381934; PubMed Central PMCID: PMC1347437.
- View Article
- PubMed/NCBI
- Google Scholar
41. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9. Epub 2006/05/30. pmid:16731699.
- View Article
- PubMed/NCBI
- Google Scholar
42. Ruepp A, Zollner A, Maier D, Albermann K, Hani J, Mokrejs M, et al. The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res. 2004;32(18):5539–45. pmid:15486203; PubMed Central PMCID: PMC524302.
- View Article
- PubMed/NCBI
- Google Scholar
43. Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, et al. BioMart—biological queries made easy. BMC Genomics. 2009;10:22. pmid:19144180; PubMed Central PMCID: PMC2649164.
- View Article
- PubMed/NCBI
- Google Scholar
44. Oliver S. Guilt-by-association goes global. Nature. 2000;403(6770):601–3. pmid:ISI:000085288200029.
- View Article
- PubMed/NCBI
- Google Scholar
45. Chen L, Zeng WM, Cai YD, Feng KY, Chou KC. Predicting Anatomical Therapeutic Chemical (ATC) Classification of Drugs by Integrating Chemical-Chemical Interactions and Similarities. PLoS ONE. 2012;7(4):e35254. pmid:22514724
- View Article
- PubMed/NCBI
- Google Scholar
46. Chen L, Lu J, Zhang N, Huang T, Cai Y-D. A hybrid method for prediction and repositioning of drug Anatomical Therapeutic Chemical classes. Molecular BioSystems. 2014;10(4):868–77. pmid:24492783
- View Article
- PubMed/NCBI
- Google Scholar
47. Chou K. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Structure, Function, and Bioinformatics. 2001;43(3):246–55.
- View Article
- Google Scholar
48. Xu Y, Ding J, Wu LY, Chou KC. iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS One. 2013;8(2):e55844. pmid:23409062; PubMed Central PMCID: PMC3567014.
- View Article
- PubMed/NCBI
- Google Scholar
49. Qiu WR, Xiao X, Lin WZ, Chou KC. iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach. Biomed Res Int. 2014;2014:947416. pmid:24977164; PubMed Central PMCID: PMC4054830.
- View Article
- PubMed/NCBI
- Google Scholar
50. Jia C, Lin X, Wang Z. Prediction of protein S-nitrosylation sites based on adapted normal distribution bi-profile Bayes and Chou's pseudo amino acid composition. International journal of molecular sciences. 2014;15(6):10410–23. pmid:24918295; PubMed Central PMCID: PMC4100159.
- View Article
- PubMed/NCBI
- Google Scholar
51. Hayat M, Khan A. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol. 2010;271(1):10–7. pmid:21110985.
- View Article
- PubMed/NCBI
- Google Scholar
52. Chen YK, Li KB. Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. J Theor Biol. 2013;318:1–12. pmid:23137835.
- View Article
- PubMed/NCBI
- Google Scholar
53. Atchley WR, Zhao J, Fernandes AD, Druke T. Solving the protein sequence metric problem. Proc Natl Acad Sci U S A. 2005;102(18):6395–400. Epub 2005/04/27. pmid:15851683; PubMed Central PMCID: PMC1088356.
- View Article
- PubMed/NCBI
- Google Scholar
54. Rubinstein ND, Mayrose I, Pupko T. A machine-learning approach for predicting B-cell epitopes. Molecular immunology. 2009;46(5):840–7. pmid:18947876
- View Article
- PubMed/NCBI
- Google Scholar
55. Huang T, Shi X, Wang P, He Z, Feng K, Hu L, et al. Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS ONE. 2010;5(6):e10972. pmid:20532046
- View Article
- PubMed/NCBI
- Google Scholar
56. Olety B, Wälte M, Honnert U, Schillers H, Bähler M. Myosin 1G (Myo1G) is a haematopoietic specific myosin that localises to the plasma membrane and regulates cell elasticity. FEBS Letters. 2010;584(3):493–9. pmid:19968988
- View Article
- PubMed/NCBI
- Google Scholar
57. Therrien M, Chang HC, Solomon NM, Karim FD, Wassarman DA, Rubin GM. KSR, a novel protein kinase required for RAS signal transduction. Cell. 1995;83(6):879–88. pmid:8521512.
- View Article
- PubMed/NCBI
- Google Scholar
58. Morrison KB, Tognon CE, Garnett MJ, Deal C, Sorensen PH. ETV6-NTRK3 transformation requires insulin-like growth factor 1 receptor signaling and is associated with constitutive IRS-1 tyrosine phosphorylation. Oncogene. 2002;21(37):5684–95. pmid:12173038.
- View Article
- PubMed/NCBI
- Google Scholar
59. Taylor EB, Ellingson WJ, Lamb JD, Chesser DG, Winder WW. Long-chain acyl-CoA esters inhibit phosphorylation of AMP-activated protein kinase at threonine-172 by LKB1/STRAD/MO25. American journal of physiology Endocrinology and metabolism. 2005;288(6):E1055–61. pmid:15644453.
- View Article
- PubMed/NCBI
- Google Scholar
60. Cordenonsi M, Montagner M, Adorno M, Zacchigna L, Martello G, Mamidi A, et al. Integration of TGF-beta and Ras/MAPK signaling through p53 phosphorylation. Science. 2007;315(5813):840–3. pmid:17234915.
- View Article
- PubMed/NCBI
- Google Scholar
61. Yuan J, Rozengurt E. PKD, PKD2, and p38 MAPK mediate Hsp27 serine-82 phosphorylation induced by neurotensin in pancreatic cancer PANC-1 cells. J Cell Biochem. 2008;103(2):648–62. pmid:17570131.
- View Article
- PubMed/NCBI
- Google Scholar
62. Yu J, Bulk E, Ji P, Hascher A, Koschmieder S, Berdel WE, et al. The kinase defective EPHB6 receptor tyrosine kinase activates MAP kinase signaling in lung adenocarcinoma. International journal of oncology. 2009;35(1):175–9. pmid:19513565.
- View Article
- PubMed/NCBI
- Google Scholar
63. Liu JA, Wu MH, Yan CH, Chau BK, So H, Ng A, et al. Phosphorylation of Sox9 is required for neural crest delamination and is regulated downstream of BMP and canonical Wnt signaling. Proc Natl Acad Sci U S A. 2013;110(8):2882–7. pmid:23382206; PubMed Central PMCID: PMC3581920.
- View Article
- PubMed/NCBI
- Google Scholar
64. Stefansson K, Brattsand M, Ny A, Glas B, Egelrud T. Kallikrein-related peptidase 14 may be a major contributor to trypsin-like proteolytic activity in human stratum corneum. Biol Chem. 2006;387(6):761–8. Epub 2006/06/28. pmid:16800737.
- View Article
- PubMed/NCBI
- Google Scholar
65. Lizama AJ, Andrade Y, Colivoro P, Sarmiento J, Matus CE, Gonzalez CB, et al. Expression and bioregulation of the kallikrein-related peptidases family in the human neutrophil. Innate immunity. 2015;21(6):575–86. pmid:25563717.
- View Article
- PubMed/NCBI
- Google Scholar
66. Tersteeg C, de Maat S, De Meyer SF, Smeets MW, Barendrecht AD, Roest M, et al. Plasmin cleavage of von Willebrand factor as an emergency bypass for ADAMTS13 deficiency in thrombotic microangiopathy. Circulation. 2014;129(12):1320–31. pmid:24449821.
- View Article
- PubMed/NCBI
- Google Scholar
67. Peradziryi H, Kaplan NA, Podleschny M, Liu X, Wehner P, Borchers A, et al. PTK7/Otk interacts with Wnts and inhibits canonical Wnt signalling. The EMBO journal. 2011;30(18):3729–40. pmid:21772251; PubMed Central PMCID: PMC3173783.
- View Article
- PubMed/NCBI
- Google Scholar
68. Lu X, Borchers AG, Jolicoeur C, Rayburn H, Baker JC, Tessier-Lavigne M. PTK7/CCK-4 is a novel regulator of planar cell polarity in vertebrates. Nature. 2004;430(6995):93–8. pmid:15229603.
- View Article
- PubMed/NCBI
- Google Scholar
69. Duan Y, Sun Y, Zhang F, Zhang WK, Wang D, Wang Y, et al. Keratin K18 increases cystic fibrosis transmembrane conductance regulator (CFTR) surface expression by binding to its C-terminal hydrophobic patch. J Biol Chem. 2012;287(48):40547–59. pmid:23045527; PubMed Central PMCID: PMC3504769.
- View Article
- PubMed/NCBI
- Google Scholar
70. Nishizawa M, Izawa I, Inoko A, Hayashi Y, Nagata K, Yokoyama T, et al. Identification of trichoplein, a novel keratin filament-binding protein. Journal of cell science. 2005;118(Pt 5):1081–90. pmid:15731013.
- View Article
- PubMed/NCBI
- Google Scholar
71. Sugimoto M, Inoko A, Shiromizu T, Nakayama M, Zou P, Yonemura S, et al. The keratin-binding protein Albatross regulates polarization of epithelial cells. The Journal of cell biology. 2008;183(1):19–28. pmid:18838552; PubMed Central PMCID: PMC2557036.
- View Article
- PubMed/NCBI
- Google Scholar
72. Agrawal PB, Pierson CR, Joshi M, Liu X, Ravenscroft G, Moghadaszadeh B, et al. SPEG interacts with myotubularin, and its deficiency causes centronuclear myopathy with dilated cardiomyopathy. American journal of human genetics. 2014;95(2):218–26. pmid:25087613; PubMed Central PMCID: PMC4129406.
- View Article
- PubMed/NCBI
- Google Scholar
73. Aumais JP, Williams SN, Luo W, Nishino M, Caldwell KA, Caldwell GA, et al. Role for NudC, a dynein-associated nuclear movement protein, in mitosis and cytokinesis. Journal of cell science. 2003;116(10):1991–2003. pmid:12679384
- View Article
- PubMed/NCBI
- Google Scholar
74. Liu Z, Myers LC. Med5(Nut1) and Med17(Srb4) are direct targets of mediator histone H4 tail interactions. PLoS One. 2012;7(6):e38416. Epub 2012/06/14. pmid:22693636; PubMed Central PMCID: PMC3367926.
- View Article
- PubMed/NCBI
- Google Scholar
75. Kikuchi Y, Umemura H, Nishitani S, Iida S, Fukasawa R, Hayashi H, et al. Human mediator MED17 subunit plays essential roles in gene regulation by associating with the transcription and DNA repair machineries. Genes Cells. 2015;20(3):191–202. pmid:25482373.
- View Article
- PubMed/NCBI
- Google Scholar
76. Fukuda MN, Miyoshi M, Nadano D. The role of bystin in embryo implantation and in ribosomal biogenesis. Cell Mol Life Sci. 2008;65(1):92–9. Epub 2007/10/06. pmid:17917702; PubMed Central PMCID: PMC2771125.
- View Article
- PubMed/NCBI
- Google Scholar
77. Landry DM, Hertz MI, Thompson SR. RPS25 is essential for translation initiation by the Dicistroviridae and hepatitis C viral IRESs. Genes & development. 2009;23(23):2753–64. Epub 2009/12/03. pmid:19952110; PubMed Central PMCID: PMC2788332.
- View Article
- PubMed/NCBI
- Google Scholar
78. Awata H, Endo F, Matsuda I. Structure of the human 4-hydroxyphenylpyruvic acid dioxygenase gene (HPD). Genomics. 1994;23(3):534–9. pmid:7851880.
- View Article
- PubMed/NCBI
- Google Scholar
79. Gudmundsdottir K, Lord CJ, Ashworth A. The proteasome is involved in determining differential utilization of double-strand break repair pathways. Oncogene. 2007;26(54):7601–6. Epub 2007/06/15. pmid:17563742.
- View Article
- PubMed/NCBI
- Google Scholar
80. Di Pietro SM, Falcon-Perez JM, Dell'Angelica EC. Characterization of BLOC-2, a complex containing the Hermansky-Pudlak syndrome proteins HPS3, HPS5 and HPS6. Traffic. 2004;5(4):276–83. Epub 2004/03/20. pmid:15030569.
- View Article
- PubMed/NCBI
- Google Scholar
81. Lai SK, Wong CH, Lee YP, Li HY. Caspase-3-mediated degradation of condensin Cap-H regulates mitotic cell death. Cell death and differentiation. 2011;18(6):996–1004. Epub 2010/12/15. pmid:21151026; PubMed Central PMCID: PMC3131938.
- View Article
- PubMed/NCBI
- Google Scholar
82. Diaz-Martinez LA, Gimenez-Abian JF, Clarke DJ. Regulation of centromeric cohesion by sororin independently of the APC/C. Cell cycle. 2007;6(6):714–24. Epub 2007/03/16. 3935 [pii]. pmid:17361102.
- View Article
- PubMed/NCBI
- Google Scholar
83. Kurasawa Y, Earnshaw WC, Mochizuki Y, Dohmae N, Todokoro K. Essential roles of KIF4 and its binding partner PRC1 in organized central spindle midzone formation. The EMBO journal. 2004;23(16):3237–48. Epub 2004/08/07. pmid:15297875; PubMed Central PMCID: PMC514520.
- View Article
- PubMed/NCBI
- Google Scholar
84. Cornacchia D, Dileep V, Quivy JP, Foti R, Tili F, Santarella-Mellwig R, et al. Mouse Rif1 is a key regulator of the replication-timing programme in mammalian cells. The EMBO journal. 2012;31(18):3678–90. Epub 2012/08/02. pmid:22850673; PubMed Central PMCID: PMC3442270.
- View Article
- PubMed/NCBI
- Google Scholar
85. Xu L, Blackburn EH. Human Rif1 protein binds aberrant telomeres and aligns along anaphase midzone microtubules. The Journal of cell biology. 2004;167(5):819–30. pmid:15583028; PubMed Central PMCID: PMC2172464.
- View Article
- PubMed/NCBI
- Google Scholar
86. Demichel P, Rodriguez JC, Roquebert J, Simonnet G. NPFF, a FMRF-NH2-like peptide, blocks opiate effects on ileum contractions. Peptides. 1993;14(5):1005–9. Epub 1993/09/01. pmid:8284250.
- View Article
- PubMed/NCBI
- Google Scholar
87. Mollereau C, Gouarderes C, Dumont Y, Kotani M, Detheux M, Doods H, et al. Agonist and antagonist activities on human NPFF(2) receptors of the NPY ligands GR231118 and BIBP3226. Br J Pharmacol. 2001;133(1):1–4. pmid:11325787; PubMed Central PMCID: PMC1572765.
- View Article
- PubMed/NCBI
- Google Scholar
88. Sarma GN, Kinderman FS, Kim C, von Daake S, Chen L, Wang BC, et al. Structure of D-AKAP2:PKA RI complex: insights into AKAP specificity and selectivity. Structure. 2010;18(2):155–66. pmid:20159461; PubMed Central PMCID: PMC3090270.
- View Article
- PubMed/NCBI
- Google Scholar
89. Eggers CT, Schafer JC, Goldenring JR, Taylor SS. D-AKAP2 interacts with Rab4 and Rab11 through its RGS domains and regulates transferrin receptor recycling. J Biol Chem. 2009;284(47):32869–80. Epub 2009/10/03. pmid:19797056; PubMed Central PMCID: PMC2781703.
- View Article
- PubMed/NCBI
- Google Scholar
90. Li D, Yu W, Liu M. Regulation of KiSS1 gene expression. Peptides. 2009;30(1):130–8. Epub 2008/11/11. pmid:18996159.
- View Article
- PubMed/NCBI
- Google Scholar
91. Kostakis ID, Agrogiannis G, Vaiopoulos AG, Mylona E, Patsouris E, Kouraklis G, et al. KISS1 expression in colorectal cancer. APMIS: acta pathologica, microbiologica, et immunologica Scandinavica. 2013;121(10):1004–10. pmid:24033850.
- View Article
- PubMed/NCBI
- Google Scholar
92. Wang H, Jones J, Turner T, He QP, Hardy S, Grizzle WE, et al. Clinical and biological significance of KISS1 expression in prostate cancer. Am J Pathol. 2012;180(3):1170–8. pmid:22226740; PubMed Central PMCID: PMC3349884.
- View Article
- PubMed/NCBI
- Google Scholar
93. Kroll H, Bolsover S, Hsu J, Kim SH, Bouloux PM. Kisspeptin-evoked calcium signals in isolated primary rat gonadotropin- releasing hormone neurones. Neuroendocrinology. 2011;93(2):114–20. pmid:21051881.
- View Article
- PubMed/NCBI
- Google Scholar
94. Onuma TA, Duan C. Duplicated Kiss1 receptor genes in zebrafish: distinct gene expression patterns, different ligand selectivity, and a novel nuclear isoform with transactivating activity. FASEB J. 2012;26(7):2941–50. pmid:22499582.
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Erdin S, Lisewski AM, Lichtarge O. Protein function prediction: towards integration of similarity metrics. Current opinion in structural biology. 2011;21(2):180–8. pmid:21353529; PubMed Central PMCID: PMC3120633.
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Cozzetto D, Buchan DW, Bryson K, Jones DT. Protein function prediction by massive integration of evolutionary analyses and multiple data sources. BMC Bioinformatics. 2013;14 Suppl 3:S1. Epub 2013/03/27. pmid:23514099; PubMed Central PMCID: PMCPmc3584902.
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Pandey G, Kumar V, Steinbach M. Computational Approaches for Protein Function: A Review. 2006.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref4] 4. Zehetner G. OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Research. 2003;31(13):3799–803. pmid:12824422
View Article
PubMed/NCBI
Google Scholar

[13] View Article

[14] PubMed/NCBI

[15] Google Scholar

[ref5] 5. Khan S, Situ G, Decker K, Schmidt CJ. GoFigure: automated Gene Ontology annotation. Bioinformatics. 2003;19(18):2484–5. pmid:14668239.
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref6] 6. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10. Epub 1990/10/05. pmid:2231712.
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref7] 7. Wass MN, Sternberg MJ. ConFunc—functional annotation in the twilight zone. Bioinformatics. 2008;24(6):798–806. pmid:18263643.
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref8] 8. Hawkins T, Chitale M, Luban S, Kihara D. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins. 2009;74(3):566–82. pmid:18655063.
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref9] 9. Chitale M, Hawkins T, Park C, Kihara D. ESG: extended similarity group method for automated protein function prediction. Bioinformatics. 2009;25(14):1739–45. pmid:19435743; PubMed Central PMCID: PMC2705228.
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref10] 10. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402. Epub 1997/09/01. pmid:9254694; PubMed Central PMCID: PMCPmc146917.
View Article
PubMed/NCBI
Google Scholar

[37] View Article

[38] PubMed/NCBI

[39] Google Scholar

[ref11] 11. Gotz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008;36(10):3420–35. pmid:18445632; PubMed Central PMCID: PMC2425479.
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref12] 12. Holm L, Park J. DaliLite workbench for protein structure comparison. Bioinformatics. 2000;16(6):566–7. pmid:10980157.
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref13] 13. Holm L, Kaariainen S, Rosenstrom P, Schenkel A. Searching protein structure databases with DaliLite v.3. Bioinformatics. 2008;24(23):2780–1. pmid:18818215; PubMed Central PMCID: PMC2639270.
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref14] 14. Holm L, Rosenstrom P. Dali server: conservation mapping in 3D. Nucleic Acids Res. 2010;38(Web Server issue):W545–9. pmid:20457744; PubMed Central PMCID: PMC2896194.
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref15] 15. Kolodny R, Linial N. Approximate protein structural alignment in polynomial time. Proc Natl Acad Sci U S A. 2004;101(33):12201–6. pmid:15304646; PubMed Central PMCID: PMC514457.
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref16] 16. Shatsky M, Nussinov R, Wolfson HJ. A method for simultaneous alignment of multiple protein structures. Proteins. 2004;56(1):143–56. pmid:15162494.
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref17] 17. Shatsky M, Dror O, Schneidman-Duhovny D, Nussinov R, Wolfson HJ. BioInfo3D: a suite of tools for structural bioinformatics. Nucleic Acids Res. 2004;32(Web Server issue):W503–7. pmid:15215437; PubMed Central PMCID: PMC441551.
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref18] 18. O'Sullivan O, Suhre K, Abergel C, Higgins DG, Notredame C. 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J Mol Biol. 2004;340(2):385–95. pmid:15201059.
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref19] 19. Pazos F, Sternberg MJ. Automated prediction of protein function and detection of functional sites from structure. Proc Natl Acad Sci U S A. 2004;101(41):14754–9. pmid:15456910; PubMed Central PMCID: PMC522026.
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref20] 20. Pegg SC, Brown SD, Ojha S, Seffernick J, Meng EC, Morris JH, et al. Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. Biochemistry. 2006;45(8):2545–55. pmid:16489747.
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref21] 21. Saini HK, Fischer D. FRalanyzer: a tool for functional analysis of fold-recognition sequence-structure alignments. Nucleic Acids Res. 2007;35(Web Server issue):W499–502. pmid:17537819; PubMed Central PMCID: PMC1933221.
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref22] 22. An JY, Meng FR, You ZH, Chen X, Yan GY, Hu JP. Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model. Protein science: a publication of the Protein Society. 2016;25(10):1825–33. pmid:27452983; PubMed Central PMCID: PMC5029537.
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref23] 23. Huang YA, You ZH, Chen X, Chan K, Luo X. Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding. BMC Bioinformatics. 2016;17(1):184. pmid:27112932; PubMed Central PMCID: PMC4845433.
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref24] 24. Wong L, You Z-H, Ming Z, Li J, Chen X, Huang Y-A. Detection of Interactions between Proteins through Rotation Forest and Local Phase Quantization Descriptors. International journal of molecular sciences. 2016;17(1):21. pmid:26712745
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref25] 25. Oliver S. Proteomics: Guilt-by-association goes global. Nature. 2000;403(6770):601–3. pmid:10688178
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref26] 26. Schwikowski B, Uetz P, Fields S. A network of protein-protein interactions in yeast. Nat Biotechnol. 2000;18(12):1257–61. Epub 2000/12/02. pmid:11101803.
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref27] 27. Hishigaki H, Nakai K, Ono T, Tanigami A, Takagi T. Assessment of prediction accuracy of protein function from protein—protein interaction data. Yeast. 2001;18(6):523–31. Epub 2001/04/03. pmid:11284008.
View Article
PubMed/NCBI
Google Scholar

[105] View Article

[106] PubMed/NCBI

[107] Google Scholar

[ref28] 28. Chi X, Hou J. An iterative approach of protein function prediction. BMC Bioinformatics. 2011;12:437. pmid:22074332; PubMed Central PMCID: PMC3224793.
View Article
PubMed/NCBI
Google Scholar

[109] View Article

[110] PubMed/NCBI

[111] Google Scholar

[ref29] 29. Chua HN, Sung WK, Wong L. Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics. 2006;22(13):1623–30. pmid:16632496.
View Article
PubMed/NCBI
Google Scholar

[113] View Article

[114] PubMed/NCBI

[115] Google Scholar

[ref30] 30. Deng M, Zhang K, Mehta S, Chen T, Sun F. Prediction of protein function using protein-protein interaction data. Journal of computational biology: a journal of computational molecular cell biology. 2003;10(6):947–60. Epub 2004/02/26. pmid:14980019.
View Article
PubMed/NCBI
Google Scholar

[117] View Article

[118] PubMed/NCBI

[119] Google Scholar

[ref31] 31. Letovsky S, Kasif S. Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics. 2003;19 Suppl 1:i197–204. Epub 2003/07/12. pmid:12855458.
View Article
PubMed/NCBI
Google Scholar

[121] View Article

[122] PubMed/NCBI

[123] Google Scholar

[ref32] 32. Kourmpetis YA, van Dijk AD, Bink MC, van Ham RC, ter Braak CJ. Bayesian Markov Random Field analysis for protein function prediction based on network data. PLoS One. 2010;5(2):e9293. Epub 2010/03/03. pmid:20195360; PubMed Central PMCID: PMC2827541.
View Article
PubMed/NCBI
Google Scholar

[125] View Article

[126] PubMed/NCBI

[127] Google Scholar

[ref33] 33. Nabieva E, Jim K, Agarwal A, Chazelle B, Singh M. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics. 2005;21 Suppl 1:i302–10. Epub 2005/06/18. pmid:15961472.
View Article
PubMed/NCBI
Google Scholar

[129] View Article

[130] PubMed/NCBI

[131] Google Scholar

[ref34] 34. Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S. Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics. 2006;7:207. pmid:16613608; PubMed Central PMCID: PMC1473204.
View Article
PubMed/NCBI
Google Scholar

[133] View Article

[134] PubMed/NCBI

[135] Google Scholar

[ref35] 35. Arnau V, Mars S, Marin I. Iterative cluster analysis of protein interaction data. Bioinformatics. 2005;21(3):364–78. pmid:15374873.
View Article
PubMed/NCBI
Google Scholar

[137] View Article

[138] PubMed/NCBI

[139] Google Scholar

[ref36] 36. Singh R, Xu J, Berger B. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc Natl Acad Sci U S A. 2008;105(35):12763–8. pmid:18725631; PubMed Central PMCID: PMC2522262.
View Article
PubMed/NCBI
Google Scholar

[141] View Article

[142] PubMed/NCBI

[143] Google Scholar

[ref37] 37. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, et al. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Research. 2013;41(Database issue):D808–15. Epub 2012/12/04. pmid:23203871; PubMed Central PMCID: PMC3531103.
View Article
PubMed/NCBI
Google Scholar

[145] View Article

[146] PubMed/NCBI

[147] Google Scholar

[ref38] 38. Hu LL, Huang T, Shi X, Lu WC, Cai YD, Chou KC. Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties. PLoS ONE. 2011;6(1):e14556. pmid:21283518
View Article
PubMed/NCBI
Google Scholar

[149] View Article

[150] PubMed/NCBI

[151] Google Scholar

[ref39] 39. Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, et al. A large-scale evaluation of computational protein function prediction. Nature methods. 2013;10(3):221–7. Epub 2013/01/29. pmid:23353650; PubMed Central PMCID: PMC3584181.
View Article
PubMed/NCBI
Google Scholar

[153] View Article

[154] PubMed/NCBI

[155] Google Scholar

[ref40] 40. Ruepp A, Doudieu ON, van den Oever J, Brauner B, Dunger-Kaltenbach I, Fobo G, et al. The Mouse Functional Genome Database (MfunGD): functional annotation of proteins in the light of their cellular context. Nucleic Acids Res. 2006;34(Database issue):D568–71. pmid:16381934; PubMed Central PMCID: PMC1347437.
View Article
PubMed/NCBI
Google Scholar

[157] View Article

[158] PubMed/NCBI

[159] Google Scholar

[ref41] 41. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9. Epub 2006/05/30. pmid:16731699.
View Article
PubMed/NCBI
Google Scholar

[161] View Article

[162] PubMed/NCBI

[163] Google Scholar

[ref42] 42. Ruepp A, Zollner A, Maier D, Albermann K, Hani J, Mokrejs M, et al. The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res. 2004;32(18):5539–45. pmid:15486203; PubMed Central PMCID: PMC524302.
View Article
PubMed/NCBI
Google Scholar

[165] View Article

[166] PubMed/NCBI

[167] Google Scholar

[ref43] 43. Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, et al. BioMart—biological queries made easy. BMC Genomics. 2009;10:22. pmid:19144180; PubMed Central PMCID: PMC2649164.
View Article
PubMed/NCBI
Google Scholar

[169] View Article

[170] PubMed/NCBI

[171] Google Scholar

[ref44] 44. Oliver S. Guilt-by-association goes global. Nature. 2000;403(6770):601–3. pmid:ISI:000085288200029.
View Article
PubMed/NCBI
Google Scholar

[173] View Article

[174] PubMed/NCBI

[175] Google Scholar

[ref45] 45. Chen L, Zeng WM, Cai YD, Feng KY, Chou KC. Predicting Anatomical Therapeutic Chemical (ATC) Classification of Drugs by Integrating Chemical-Chemical Interactions and Similarities. PLoS ONE. 2012;7(4):e35254. pmid:22514724
View Article
PubMed/NCBI
Google Scholar

[177] View Article

[178] PubMed/NCBI

[179] Google Scholar

[ref46] 46. Chen L, Lu J, Zhang N, Huang T, Cai Y-D. A hybrid method for prediction and repositioning of drug Anatomical Therapeutic Chemical classes. Molecular BioSystems. 2014;10(4):868–77. pmid:24492783
View Article
PubMed/NCBI
Google Scholar

[181] View Article

[182] PubMed/NCBI

[183] Google Scholar

[ref47] 47. Chou K. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Structure, Function, and Bioinformatics. 2001;43(3):246–55.
View Article
Google Scholar

[185] View Article

[186] Google Scholar

[ref48] 48. Xu Y, Ding J, Wu LY, Chou KC. iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS One. 2013;8(2):e55844. pmid:23409062; PubMed Central PMCID: PMC3567014.
View Article
PubMed/NCBI
Google Scholar

[188] View Article

[189] PubMed/NCBI

[190] Google Scholar

[ref49] 49. Qiu WR, Xiao X, Lin WZ, Chou KC. iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach. Biomed Res Int. 2014;2014:947416. pmid:24977164; PubMed Central PMCID: PMC4054830.
View Article
PubMed/NCBI
Google Scholar

[192] View Article

[193] PubMed/NCBI

[194] Google Scholar

[ref50] 50. Jia C, Lin X, Wang Z. Prediction of protein S-nitrosylation sites based on adapted normal distribution bi-profile Bayes and Chou's pseudo amino acid composition. International journal of molecular sciences. 2014;15(6):10410–23. pmid:24918295; PubMed Central PMCID: PMC4100159.
View Article
PubMed/NCBI
Google Scholar

[196] View Article

[197] PubMed/NCBI

[198] Google Scholar

[ref51] 51. Hayat M, Khan A. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol. 2010;271(1):10–7. pmid:21110985.
View Article
PubMed/NCBI
Google Scholar

[200] View Article

[201] PubMed/NCBI

[202] Google Scholar

[ref52] 52. Chen YK, Li KB. Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. J Theor Biol. 2013;318:1–12. pmid:23137835.
View Article
PubMed/NCBI
Google Scholar

[204] View Article

[205] PubMed/NCBI

[206] Google Scholar

[ref53] 53. Atchley WR, Zhao J, Fernandes AD, Druke T. Solving the protein sequence metric problem. Proc Natl Acad Sci U S A. 2005;102(18):6395–400. Epub 2005/04/27. pmid:15851683; PubMed Central PMCID: PMC1088356.
View Article
PubMed/NCBI
Google Scholar

[208] View Article

[209] PubMed/NCBI

[210] Google Scholar

[ref54] 54. Rubinstein ND, Mayrose I, Pupko T. A machine-learning approach for predicting B-cell epitopes. Molecular immunology. 2009;46(5):840–7. pmid:18947876
View Article
PubMed/NCBI
Google Scholar

[212] View Article

[213] PubMed/NCBI

[214] Google Scholar

[ref55] 55. Huang T, Shi X, Wang P, He Z, Feng K, Hu L, et al. Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS ONE. 2010;5(6):e10972. pmid:20532046
View Article
PubMed/NCBI
Google Scholar

[216] View Article

[217] PubMed/NCBI

[218] Google Scholar

[ref56] 56. Olety B, Wälte M, Honnert U, Schillers H, Bähler M. Myosin 1G (Myo1G) is a haematopoietic specific myosin that localises to the plasma membrane and regulates cell elasticity. FEBS Letters. 2010;584(3):493–9. pmid:19968988
View Article
PubMed/NCBI
Google Scholar

[220] View Article

[221] PubMed/NCBI

[222] Google Scholar

[ref57] 57. Therrien M, Chang HC, Solomon NM, Karim FD, Wassarman DA, Rubin GM. KSR, a novel protein kinase required for RAS signal transduction. Cell. 1995;83(6):879–88. pmid:8521512.
View Article
PubMed/NCBI
Google Scholar

[224] View Article

[225] PubMed/NCBI

[226] Google Scholar

[ref58] 58. Morrison KB, Tognon CE, Garnett MJ, Deal C, Sorensen PH. ETV6-NTRK3 transformation requires insulin-like growth factor 1 receptor signaling and is associated with constitutive IRS-1 tyrosine phosphorylation. Oncogene. 2002;21(37):5684–95. pmid:12173038.
View Article
PubMed/NCBI
Google Scholar

[228] View Article

[229] PubMed/NCBI

[230] Google Scholar

[ref59] 59. Taylor EB, Ellingson WJ, Lamb JD, Chesser DG, Winder WW. Long-chain acyl-CoA esters inhibit phosphorylation of AMP-activated protein kinase at threonine-172 by LKB1/STRAD/MO25. American journal of physiology Endocrinology and metabolism. 2005;288(6):E1055–61. pmid:15644453.
View Article
PubMed/NCBI
Google Scholar

[232] View Article

[233] PubMed/NCBI

[234] Google Scholar

[ref60] 60. Cordenonsi M, Montagner M, Adorno M, Zacchigna L, Martello G, Mamidi A, et al. Integration of TGF-beta and Ras/MAPK signaling through p53 phosphorylation. Science. 2007;315(5813):840–3. pmid:17234915.
View Article
PubMed/NCBI
Google Scholar

[236] View Article

[237] PubMed/NCBI

[238] Google Scholar

[ref61] 61. Yuan J, Rozengurt E. PKD, PKD2, and p38 MAPK mediate Hsp27 serine-82 phosphorylation induced by neurotensin in pancreatic cancer PANC-1 cells. J Cell Biochem. 2008;103(2):648–62. pmid:17570131.
View Article
PubMed/NCBI
Google Scholar

[240] View Article

[241] PubMed/NCBI

[242] Google Scholar

[ref62] 62. Yu J, Bulk E, Ji P, Hascher A, Koschmieder S, Berdel WE, et al. The kinase defective EPHB6 receptor tyrosine kinase activates MAP kinase signaling in lung adenocarcinoma. International journal of oncology. 2009;35(1):175–9. pmid:19513565.
View Article
PubMed/NCBI
Google Scholar

[244] View Article

[245] PubMed/NCBI

[246] Google Scholar

[ref63] 63. Liu JA, Wu MH, Yan CH, Chau BK, So H, Ng A, et al. Phosphorylation of Sox9 is required for neural crest delamination and is regulated downstream of BMP and canonical Wnt signaling. Proc Natl Acad Sci U S A. 2013;110(8):2882–7. pmid:23382206; PubMed Central PMCID: PMC3581920.
View Article
PubMed/NCBI
Google Scholar

[248] View Article

[249] PubMed/NCBI

[250] Google Scholar

[ref64] 64. Stefansson K, Brattsand M, Ny A, Glas B, Egelrud T. Kallikrein-related peptidase 14 may be a major contributor to trypsin-like proteolytic activity in human stratum corneum. Biol Chem. 2006;387(6):761–8. Epub 2006/06/28. pmid:16800737.
View Article
PubMed/NCBI
Google Scholar

[252] View Article

[253] PubMed/NCBI

[254] Google Scholar

[ref65] 65. Lizama AJ, Andrade Y, Colivoro P, Sarmiento J, Matus CE, Gonzalez CB, et al. Expression and bioregulation of the kallikrein-related peptidases family in the human neutrophil. Innate immunity. 2015;21(6):575–86. pmid:25563717.
View Article
PubMed/NCBI
Google Scholar

[256] View Article

[257] PubMed/NCBI

[258] Google Scholar

[ref66] 66. Tersteeg C, de Maat S, De Meyer SF, Smeets MW, Barendrecht AD, Roest M, et al. Plasmin cleavage of von Willebrand factor as an emergency bypass for ADAMTS13 deficiency in thrombotic microangiopathy. Circulation. 2014;129(12):1320–31. pmid:24449821.
View Article
PubMed/NCBI
Google Scholar

[260] View Article

[261] PubMed/NCBI

[262] Google Scholar

[ref67] 67. Peradziryi H, Kaplan NA, Podleschny M, Liu X, Wehner P, Borchers A, et al. PTK7/Otk interacts with Wnts and inhibits canonical Wnt signalling. The EMBO journal. 2011;30(18):3729–40. pmid:21772251; PubMed Central PMCID: PMC3173783.
View Article
PubMed/NCBI
Google Scholar

[264] View Article

[265] PubMed/NCBI

[266] Google Scholar

[ref68] 68. Lu X, Borchers AG, Jolicoeur C, Rayburn H, Baker JC, Tessier-Lavigne M. PTK7/CCK-4 is a novel regulator of planar cell polarity in vertebrates. Nature. 2004;430(6995):93–8. pmid:15229603.
View Article
PubMed/NCBI
Google Scholar

[268] View Article

[269] PubMed/NCBI

[270] Google Scholar

[ref69] 69. Duan Y, Sun Y, Zhang F, Zhang WK, Wang D, Wang Y, et al. Keratin K18 increases cystic fibrosis transmembrane conductance regulator (CFTR) surface expression by binding to its C-terminal hydrophobic patch. J Biol Chem. 2012;287(48):40547–59. pmid:23045527; PubMed Central PMCID: PMC3504769.
View Article
PubMed/NCBI
Google Scholar

[272] View Article

[273] PubMed/NCBI

[274] Google Scholar

[ref70] 70. Nishizawa M, Izawa I, Inoko A, Hayashi Y, Nagata K, Yokoyama T, et al. Identification of trichoplein, a novel keratin filament-binding protein. Journal of cell science. 2005;118(Pt 5):1081–90. pmid:15731013.
View Article
PubMed/NCBI
Google Scholar

[276] View Article

[277] PubMed/NCBI

[278] Google Scholar

[ref71] 71. Sugimoto M, Inoko A, Shiromizu T, Nakayama M, Zou P, Yonemura S, et al. The keratin-binding protein Albatross regulates polarization of epithelial cells. The Journal of cell biology. 2008;183(1):19–28. pmid:18838552; PubMed Central PMCID: PMC2557036.
View Article
PubMed/NCBI
Google Scholar

[280] View Article

[281] PubMed/NCBI

[282] Google Scholar

[ref72] 72. Agrawal PB, Pierson CR, Joshi M, Liu X, Ravenscroft G, Moghadaszadeh B, et al. SPEG interacts with myotubularin, and its deficiency causes centronuclear myopathy with dilated cardiomyopathy. American journal of human genetics. 2014;95(2):218–26. pmid:25087613; PubMed Central PMCID: PMC4129406.
View Article
PubMed/NCBI
Google Scholar

[284] View Article

[285] PubMed/NCBI

[286] Google Scholar

[ref73] 73. Aumais JP, Williams SN, Luo W, Nishino M, Caldwell KA, Caldwell GA, et al. Role for NudC, a dynein-associated nuclear movement protein, in mitosis and cytokinesis. Journal of cell science. 2003;116(10):1991–2003. pmid:12679384
View Article
PubMed/NCBI
Google Scholar

[288] View Article

[289] PubMed/NCBI

[290] Google Scholar

[ref74] 74. Liu Z, Myers LC. Med5(Nut1) and Med17(Srb4) are direct targets of mediator histone H4 tail interactions. PLoS One. 2012;7(6):e38416. Epub 2012/06/14. pmid:22693636; PubMed Central PMCID: PMC3367926.
View Article
PubMed/NCBI
Google Scholar

[292] View Article

[293] PubMed/NCBI

[294] Google Scholar

[ref75] 75. Kikuchi Y, Umemura H, Nishitani S, Iida S, Fukasawa R, Hayashi H, et al. Human mediator MED17 subunit plays essential roles in gene regulation by associating with the transcription and DNA repair machineries. Genes Cells. 2015;20(3):191–202. pmid:25482373.
View Article
PubMed/NCBI
Google Scholar

[296] View Article

[297] PubMed/NCBI

[298] Google Scholar

[ref76] 76. Fukuda MN, Miyoshi M, Nadano D. The role of bystin in embryo implantation and in ribosomal biogenesis. Cell Mol Life Sci. 2008;65(1):92–9. Epub 2007/10/06. pmid:17917702; PubMed Central PMCID: PMC2771125.
View Article
PubMed/NCBI
Google Scholar

[300] View Article

[301] PubMed/NCBI

[302] Google Scholar

[ref77] 77. Landry DM, Hertz MI, Thompson SR. RPS25 is essential for translation initiation by the Dicistroviridae and hepatitis C viral IRESs. Genes & development. 2009;23(23):2753–64. Epub 2009/12/03. pmid:19952110; PubMed Central PMCID: PMC2788332.
View Article
PubMed/NCBI
Google Scholar

[304] View Article

[305] PubMed/NCBI

[306] Google Scholar

[ref78] 78. Awata H, Endo F, Matsuda I. Structure of the human 4-hydroxyphenylpyruvic acid dioxygenase gene (HPD). Genomics. 1994;23(3):534–9. pmid:7851880.
View Article
PubMed/NCBI
Google Scholar

[308] View Article

[309] PubMed/NCBI

[310] Google Scholar

[ref79] 79. Gudmundsdottir K, Lord CJ, Ashworth A. The proteasome is involved in determining differential utilization of double-strand break repair pathways. Oncogene. 2007;26(54):7601–6. Epub 2007/06/15. pmid:17563742.
View Article
PubMed/NCBI
Google Scholar

[312] View Article

[313] PubMed/NCBI

[314] Google Scholar

[ref80] 80. Di Pietro SM, Falcon-Perez JM, Dell'Angelica EC. Characterization of BLOC-2, a complex containing the Hermansky-Pudlak syndrome proteins HPS3, HPS5 and HPS6. Traffic. 2004;5(4):276–83. Epub 2004/03/20. pmid:15030569.
View Article
PubMed/NCBI
Google Scholar

[316] View Article

[317] PubMed/NCBI

[318] Google Scholar

[ref81] 81. Lai SK, Wong CH, Lee YP, Li HY. Caspase-3-mediated degradation of condensin Cap-H regulates mitotic cell death. Cell death and differentiation. 2011;18(6):996–1004. Epub 2010/12/15. pmid:21151026; PubMed Central PMCID: PMC3131938.
View Article
PubMed/NCBI
Google Scholar

[320] View Article

[321] PubMed/NCBI

[322] Google Scholar

[ref82] 82. Diaz-Martinez LA, Gimenez-Abian JF, Clarke DJ. Regulation of centromeric cohesion by sororin independently of the APC/C. Cell cycle. 2007;6(6):714–24. Epub 2007/03/16. 3935 [pii]. pmid:17361102.
View Article
PubMed/NCBI
Google Scholar

[324] View Article

[325] PubMed/NCBI

[326] Google Scholar

[ref83] 83. Kurasawa Y, Earnshaw WC, Mochizuki Y, Dohmae N, Todokoro K. Essential roles of KIF4 and its binding partner PRC1 in organized central spindle midzone formation. The EMBO journal. 2004;23(16):3237–48. Epub 2004/08/07. pmid:15297875; PubMed Central PMCID: PMC514520.
View Article
PubMed/NCBI
Google Scholar

[328] View Article

[329] PubMed/NCBI

[330] Google Scholar

[ref84] 84. Cornacchia D, Dileep V, Quivy JP, Foti R, Tili F, Santarella-Mellwig R, et al. Mouse Rif1 is a key regulator of the replication-timing programme in mammalian cells. The EMBO journal. 2012;31(18):3678–90. Epub 2012/08/02. pmid:22850673; PubMed Central PMCID: PMC3442270.
View Article
PubMed/NCBI
Google Scholar

[332] View Article

[333] PubMed/NCBI

[334] Google Scholar

[ref85] 85. Xu L, Blackburn EH. Human Rif1 protein binds aberrant telomeres and aligns along anaphase midzone microtubules. The Journal of cell biology. 2004;167(5):819–30. pmid:15583028; PubMed Central PMCID: PMC2172464.
View Article
PubMed/NCBI
Google Scholar

[336] View Article

[337] PubMed/NCBI

[338] Google Scholar

[ref86] 86. Demichel P, Rodriguez JC, Roquebert J, Simonnet G. NPFF, a FMRF-NH2-like peptide, blocks opiate effects on ileum contractions. Peptides. 1993;14(5):1005–9. Epub 1993/09/01. pmid:8284250.
View Article
PubMed/NCBI
Google Scholar

[340] View Article

[341] PubMed/NCBI

[342] Google Scholar

[ref87] 87. Mollereau C, Gouarderes C, Dumont Y, Kotani M, Detheux M, Doods H, et al. Agonist and antagonist activities on human NPFF(2) receptors of the NPY ligands GR231118 and BIBP3226. Br J Pharmacol. 2001;133(1):1–4. pmid:11325787; PubMed Central PMCID: PMC1572765.
View Article
PubMed/NCBI
Google Scholar

[344] View Article

[345] PubMed/NCBI

[346] Google Scholar

[ref88] 88. Sarma GN, Kinderman FS, Kim C, von Daake S, Chen L, Wang BC, et al. Structure of D-AKAP2:PKA RI complex: insights into AKAP specificity and selectivity. Structure. 2010;18(2):155–66. pmid:20159461; PubMed Central PMCID: PMC3090270.
View Article
PubMed/NCBI
Google Scholar

[348] View Article

[349] PubMed/NCBI

[350] Google Scholar

[ref89] 89. Eggers CT, Schafer JC, Goldenring JR, Taylor SS. D-AKAP2 interacts with Rab4 and Rab11 through its RGS domains and regulates transferrin receptor recycling. J Biol Chem. 2009;284(47):32869–80. Epub 2009/10/03. pmid:19797056; PubMed Central PMCID: PMC2781703.
View Article
PubMed/NCBI
Google Scholar

[352] View Article

[353] PubMed/NCBI

[354] Google Scholar

[ref90] 90. Li D, Yu W, Liu M. Regulation of KiSS1 gene expression. Peptides. 2009;30(1):130–8. Epub 2008/11/11. pmid:18996159.
View Article
PubMed/NCBI
Google Scholar

[356] View Article

[357] PubMed/NCBI

[358] Google Scholar

[ref91] 91. Kostakis ID, Agrogiannis G, Vaiopoulos AG, Mylona E, Patsouris E, Kouraklis G, et al. KISS1 expression in colorectal cancer. APMIS: acta pathologica, microbiologica, et immunologica Scandinavica. 2013;121(10):1004–10. pmid:24033850.
View Article
PubMed/NCBI
Google Scholar

[360] View Article

[361] PubMed/NCBI

[362] Google Scholar

[ref92] 92. Wang H, Jones J, Turner T, He QP, Hardy S, Grizzle WE, et al. Clinical and biological significance of KISS1 expression in prostate cancer. Am J Pathol. 2012;180(3):1170–8. pmid:22226740; PubMed Central PMCID: PMC3349884.
View Article
PubMed/NCBI
Google Scholar

[364] View Article

[365] PubMed/NCBI

[366] Google Scholar

[ref93] 93. Kroll H, Bolsover S, Hsu J, Kim SH, Bouloux PM. Kisspeptin-evoked calcium signals in isolated primary rat gonadotropin- releasing hormone neurones. Neuroendocrinology. 2011;93(2):114–20. pmid:21051881.
View Article
PubMed/NCBI
Google Scholar

[368] View Article

[369] PubMed/NCBI

[370] Google Scholar

[ref94] 94. Onuma TA, Duan C. Duplicated Kiss1 receptor genes in zebrafish: distinct gene expression patterns, different ligand selectivity, and a novel nuclear isoform with transactivating activity. FASEB J. 2012;26(7):2941–50. pmid:22499582.
View Article
PubMed/NCBI
Google Scholar

[372] View Article

[373] PubMed/NCBI

[374] Google Scholar

Figures

Abstract

1 Introduction

2 Data and Methods

2.1 Data

2.2 Methods

2.2.1 Sequence similarity-based approach

2.2.2 Weighted interaction-based approach.

2.2.3 PseAAC-based approach.

2.3 Cross-Validation and Assessment

3 Results and Discussion

3.1 Performance of the Simple Approach

3.2 Prediction by the Combined Approach

3.3 Possible Protein Functions

3.4 Possible Function Analysis of Significant "False-Wrong" 1st-Order Predicted Proteins

4. Conclusion

Supporting Information

S1 Table. The dataset used in this study.

S2 Table. The "false-wrong" 1st-order predicted proteins.

Acknowledgments

Author Contributions

References

S2 Table. The "false-wrong" 1^st-order predicted proteins.