Skip to main content
Log in

LPI-SKMSC: Predicting LncRNA–Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

 Long noncoding RNAs (lncRNAs) have significant regulatory roles in gene expression. Interactions with proteins are one of the ways lncRNAs play their roles. Since experiments to determine lncRNA–protein interactions (LPIs) are expensive and time-consuming, many computational methods for predicting LPIs have been proposed as alternatives. In the LPIs prediction problem, there commonly exists the imbalance in the distribution of positive and negative samples. However, there are few existing methods that give specific consideration to this problem. In this paper, we proposed a new clustering-based LPIs prediction method using segmented k-mer frequencies and multi-space clustering (LPI-SKMSC). It was dedicated to handling the imbalance of positive and negative samples. We constructed segmented k-mer frequencies to obtain global and local features of lncRNA and protein sequences. Then, the multi-space clustering was applied to LPI-SKMSC. The convolutional neural network (CNN)-based encoders were used to map different features of a sample to different spaces. It used multiple spaces to jointly constrain the classification of samples. Finally, the distances between the output features of the encoder and the cluster center in each space were calculated. The sum of distances in all spaces was compared with the cluster radius to predict the LPIs. We performed cross-validation on 3 public datasets and LPI-SKMSC showed the best performance compared to other existing methods. Experimental results showed that LPI-SKMSC could predict LPIs more effectively when faced with imbalanced positive and negative samples. In addition, we illustrated that our model was better at uncovering potential lncRNA–protein interaction pairs.

Graphic Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Mattick JS, Amaral PP, Carninci P et al (2023) Long non-coding rnas: definitions, functions, challenges and recommendations. Nat Rev Mol Cell Biol 1–17. https://doi.org/10.1038/s41580-022-00566-8

  2. Sun Y-M, Chen Y-Q (2020) Principles and innovative technologies for decrypting noncoding rnas: from discovery and functional prediction to clinical application. J Hematol Oncol 13:1–27. https://doi.org/10.1186/s13045-020-00945-8

    Article  CAS  Google Scholar 

  3. Kung JT, Colognori D, Lee JT (2013) Long noncoding rnas: past, present, and future. Genetics 193(3):651–669. https://doi.org/10.1534/genetics.112.146704

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Ma L, Bajic VB, Zhang Z (2013) On the classification of long non-coding rnas. RNA Biol 10(6):924–933. https://doi.org/10.4161/rna.24604

    Article  CAS  PubMed Central  Google Scholar 

  5. Mercer TR, Dinger ME, Mattick JS (2009) Long non-coding rnas: insights into functions. Nat Rev Genet 10(3):155–159. https://doi.org/10.1038/nrg2521

    Article  CAS  PubMed  Google Scholar 

  6. Nojima T, Proudfoot NJ (2022) Mechanisms of lncrna biogenesis as revealed by nascent transcriptomics. Nat Rev Mol Cell Biol 23(6):389–406. https://doi.org/10.1038/s41580-021-00447-6

    Article  CAS  PubMed  Google Scholar 

  7. Ravasi T, Suzuki H, Pang KC et al (2006) Experimental validation of the regulated expression of large numbers of non-coding rnas from the mouse genome. Genome Res 16(1):11–19. https://doi.org/10.1101/gr.4200206

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Änkö M-L, Neugebauer KM (2012) Rna-protein interactions in vivo: global gets specific. Trends Biochem Sci 37(7):255–262. https://doi.org/10.1016/j.tibs.2012.02.005

    Article  CAS  PubMed  Google Scholar 

  9. Wang X, Arai S, Song X et al (2008) Induced ncrnas allosterically modify rna-binding proteins in cis to inhibit transcription. Nature 454(7200):126–130. https://doi.org/10.1038/nature06992

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Wapinski O, Chang HY (2011) Long noncoding rnas and human disease. Trends Cell Biol 21(6):354–361. https://doi.org/10.1016/j.tcb.2011.04.001

    Article  CAS  PubMed  Google Scholar 

  11. Gao N, Li Y, Li J et al (2020) Long non-coding rnas: the regulatory mechanisms, research strategies, and future directions in cancers. Front Oncol 10:598817. https://doi.org/10.3389/fonc.2020.598817

    Article  PubMed  PubMed Central  Google Scholar 

  12. Statello L, Guo C-J, Chen L-L et al (2021) Gene regulation by long non-coding rnas and its biological functions. Nat Rev Mol Cell Biol 22(2):96–118. https://doi.org/10.1038/s41580-020-00315-9

    Article  CAS  PubMed  Google Scholar 

  13. Esteller M (2011) Non-coding rnas in human disease. Nat Rev Genet 12(12):861–874. https://doi.org/10.1038/nrg3074

    Article  CAS  PubMed  Google Scholar 

  14. Sideris N, Dama P, Bayraktar S et al (2022) Lncrnas in breast cancer: A link to future approaches. Cancer Gene Ther., 1–12 https://doi.org/10.1038/s41417-022-00487-w

  15. Simion V, Haemmig S, Feinberg MW (2019) Lncrnas in vascular biology and disease. Vascul Pharmacol 114:145–156. https://doi.org/10.1016/j.vph.2018.01.003

    Article  CAS  PubMed  Google Scholar 

  16. Aznaourova M, Schmerer N, Schmeck B et al (2020) Disease-causing mutations and rearrangements in long non-coding rna gene loci. Front Genet 11:527484. https://doi.org/10.3389/fgene.2020.527484

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Ray D, Kazan H, Chan ET et al (2009) Rapid and systematic analysis of the rna recognition specificities of rna-binding proteins. Nat Biotechnol 27(7):667–670. https://doi.org/10.1038/nbt.1550

    Article  CAS  PubMed  Google Scholar 

  18. Keene JD, Komisarow JM, Friedersdorf MB (2006) Rip-chip: the isolation and identification of mrnas, micrornas and protein components of ribonucleoprotein complexes from cell extracts. Nat Protoc 1(1):302–307. https://doi.org/10.1038/nprot.2006.47

    Article  CAS  PubMed  Google Scholar 

  19. Licatalosi DD, Mele A, Fak JJ et al (2008) Hits-clip yields genome-wide insights into brain alternative rna processing. Nature 456(7221):464–469. https://doi.org/10.1038/nature07488

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Hafner M, Landthaler M, Burger L et al (2010) Transcriptome-wide identification of rna-binding protein and microrna target sites by par-clip. Cell 141(1):129–141. https://doi.org/10.1016/j.cell.2010.03.009

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Bellucci M, Agostini F, Masin M et al (2011) Predicting protein associations with long noncoding rnas. Nat Methods 8(6):444–445. https://doi.org/10.1038/nmeth.1611

    Article  CAS  PubMed  Google Scholar 

  22. Muppirala UK, Honavar VG, Dobbs D (2011) Predicting rna-protein interactions using only sequence information. BMC Bioinform. 12(1):1–11. https://doi.org/10.1186/1471-2105-12-489

    Article  CAS  Google Scholar 

  23. Lu Q, Ren S, Lu M et al (2013) Computational prediction of associations between long non-coding rnas and proteins. BMC Genom 14(1):1–10. https://doi.org/10.1186/1471-2164-14-651

    Article  CAS  Google Scholar 

  24. Suresh V, Liu L, Adjeroh D et al (2015) Rpi-pred: predicting ncrna-protein interaction using sequence and structural information. Nucleic Acids Res 43(3):1370–1379. https://doi.org/10.1093/nar/gkv020

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Li A, Ge M, Zhang Y et al (2015) Predicting long noncoding rna and protein interactions using heterogeneous network model. BioMed Res Int 2015. https://doi.org/10.1155/2015/671950

  26. Ge M, Li A, Wang M (2016) A bipartite network-based method for prediction of long non-coding rna-protein interactions. Genom Proteom Bioinf 14(1):62–71. https://doi.org/10.1016/j.gpb.2016.01.004

    Article  CAS  Google Scholar 

  27. Pan X, Fan Y-X, Yan J et al (2016) Ipminer: hidden ncrna-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction. BMC Genom 17:1–14. https://doi.org/10.1186/s12864-016-2931-8

    Article  CAS  Google Scholar 

  28. Wekesa JS, Luan Y, Chen M et al (2019) A hybrid prediction method for plant lncrna-protein interaction. Cells 8(6):521. https://doi.org/10.3390/cells8060521

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Zhang S-W, Zhang X-X, Fan X-N et al (2020) Lpi-cnncp: prediction of lncrna-protein interactions by using convolutional neural network with the copy-padding trick. Anal Biochem 601:113767. https://doi.org/10.1016/j.ab.2020.113767

    Article  CAS  PubMed  Google Scholar 

  30. Huang L, Jiao S, Yang S et al (2021) Lgfc-cnn: prediction of lncrna-protein interactions by using multiple types of features through deep learning. Genes 12(11):1689. https://doi.org/10.3390/genes12111689

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Shen Z-A, Luo T, Zhou Y-K et al (2021) Npi-gnn: Predicting ncrna-protein interactions with deep graph neural networks. Brief. Bioinformatics 22(5):051. https://doi.org/10.1093/bib/bbab051

    Article  CAS  Google Scholar 

  32. Fan X-N, Zhang S-W (2019) Lpi-bls: Predicting lncrna-protein interactions with a broad learning system-based stacked ensemble classifier. Neurocomputing 370:88–93. https://doi.org/10.1016/j.neucom.2019.08.084

    Article  Google Scholar 

  33. Zhou H, Wekesa JS, Luan Y et al (2021) Prpi-sc: an ensemble deep learning model for predicting plant lncrna-protein interactions. BMC Bioinform. 22(3):1–15. https://doi.org/10.1186/s12859-021-04328-9

    Article  CAS  Google Scholar 

  34. Zhuo L, Song B, Liu Y et al (2022) Predicting ncrna-protein interactions based on dual graph convolutional network and pairwise learning. Brief. Bioinformatics 23(6):339. https://doi.org/10.1093/bib/bbac339

    Article  CAS  Google Scholar 

  35. Tian X, Shen L, Wang Z et al (2021) A novel lncrna-protein interaction prediction method based on deep forest with cascade forest structure. Sci Rep 11(1):18881. https://doi.org/10.1038/s41598-021-98277-1

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Lihong P, Wang C, Tian X et al (2021) Finding lncrna-protein interactions based on deep learning with dual-net neural architecture. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2021.3116232

  37. Song J, Tian S, Yu L et al (2022) Rlf-lpi: An ensemble learning framework using sequence information for predicting lncrna-protein interaction based on ae-reslstm and fuzzy decision. Math Biosci Eng 19(5):4749–4764. https://doi.org/10.3934/mbe.2022222

    Article  PubMed  Google Scholar 

  38. Zhou L, Wang Z, Tian X et al (2021) Lpi-deepgbdt: a multiple-layer deep framework based on gradient boosting decision trees for lncrna-protein interaction identification. BMC Bioinform. 22(1):1–24. https://doi.org/10.1186/s12859-021-04399-8

    Article  CAS  Google Scholar 

  39. Yu B, Wang X, Zhang Y et al (2022) Rpi-mdlstack: Predicting rna-protein interactions through deep learning with stacking strategy and lasso. Appl Soft Comput 120:108676. https://doi.org/10.1016/j.asoc.2022.108676

    Article  Google Scholar 

  40. Hao Y, Wu W, Li H et al (2016) Npinter v3. 0: an upgraded database of noncoding rna-associated interactions. Database 2016https://doi.org/10.1093/database/baw057

  41. Apweiler R, Bairoch A, Wu CH et al (2004) Uniprot: the universal protein knowledgebase. Nucleic Acids Res. 32(suppl_1):115–119 https://doi.org/10.1093/nar/gkh131

  42. Frankish A, Diekhans M, Ferreira A-M et al (2019) Gencode reference annotation for the human and mouse genomes. Nucleic Acids Res 47(D1):766–773. https://doi.org/10.1093/nar/gky955

    Article  CAS  Google Scholar 

  43. Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28(1):235–242. https://doi.org/10.1093/nar/28.1.235

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Lewis BA, Walia RR, Terribilini M et al (2010) Pridb: a protein–rna interface database. Nucleic Acids Res. 39(suppl_1):277–282 https://doi.org/10.1093/nar/gkq1108

  45. Ruff L, Vandermeulen RA, Görnitz N et al (2019) Deep semi-supervised anomaly detection. arXiv preprint arXiv:1906.02694

  46. Ruff L, Vandermeulen R, Goernitz N et al (2018) Deep one-class classification, 4393–4402. PMLR. https://proceedings.mlr.press/v80/ruff18a.html

  47. Wang L, Wang H-F, Liu S-R et al (2019) Predicting protein-protein interactions from matrix-based protein sequence using convolution neural network and feature-selective rotation forest. Sci Rep 9(1):9848. https://doi.org/10.1038/s41598-019-46369-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Soleymani F, Paquet E, Viktor HL et al (2023) Protinteract: A deep learning framework for predicting protein–protein interactions. Comput Struct Biotechnol J 21:1324–1348. https://doi.org/10.1016/j.csbj.2023.01.028

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Yang X, Yang S, Lian X et al (2021) Transfer learning via multi-scale convolutional neural layers for human-virus protein-protein interaction prediction. Bioinformatics 37(24):4771–4778. https://doi.org/10.1093/bioinformatics/btab533

    Article  CAS  PubMed  Google Scholar 

  50. Liu K, Cao L, Du P et al (2020) im6a-ts-cnn: identifying the n6-methyladenine site in multiple tissues by using the convolutional neural network. Mol. Ther. Nucleic. Acids 21:1044–1049. https://doi.org/10.1016/j.omtn.2020.07.034

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Alam W, Ali SD, Tayara H et al (2020) A cnn-based rna n6-methyladenosine site predictor for multiple species using heterogeneous features representation. IEEE Access 8:138203–138209. https://doi.org/10.1109/ACCESS.2020.3002995

    Article  Google Scholar 

  52. Zhang P, Meng J, Luan Y et al (2020) Plant mirna–lncrna interaction prediction with the ensemble of cnn and indrnn. Interdiscip Sci 12:82–89. https://doi.org/10.1007/s12539-019-00351-w

    Article  CAS  PubMed  Google Scholar 

  53. Kang Q, Meng J, Cui J et al (2020) Pmlipred: a method based on hybrid model and fuzzy decision for plant mirna–lncrna interaction prediction. Bioinformatics 36(10):2986–2992. https://doi.org/10.1093/bioinformatics/btaa074

    Article  CAS  PubMed  Google Scholar 

  54. Yang S, Wang Y, Lin Y et al (2020) Lncmirnet: predicting lncrna-mirna interaction based on deep learning of ribonucleic acid sequences. Molecules 25(19):4372. https://doi.org/10.3390/molecules25194372

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhan-Li Sun.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflicts of interest.

Data availability

The source code and datasets for this study are accessible at https://github.com/zhlSunLab/LPI-SKMSC.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, DZ., Sun, ZL., Liu, M. et al. LPI-SKMSC: Predicting LncRNA–Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering. Interdiscip Sci Comput Life Sci (2024). https://doi.org/10.1007/s12539-023-00598-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12539-023-00598-4

Keywords

Navigation