Abstract

KaKs_Calculator 3.0 is an updated toolkit that is capable of calculating selective pressure on both coding and non-coding sequences. Similar to the nonsynonymous/synonymous substitution rate ratio for coding sequences, selection on non-coding sequences can be quantified as the ratio of non-coding nucleotide substitution rate to synonymous substitution rate of adjacent coding sequences. As testified on empirical data, KaKs_Calculator 3.0 shows effectiveness to detect the strength and mode of selection operated on molecular sequences, accordingly demonstrating its great potential to achieve genome-wide scan of natural selection on diverse sequences and identification of potentially functional elements at a whole-genome scale. The package of KaKs_Calculator 3.0 is freely available for academic use only at https://ngdc.cncb.ac.cn/biocode/tools/BT000001.

Introduction

Detecting natural selection on molecular sequences is of fundamental significance in molecular evolution, comparative genomics, and phylogenetic reconstruction, which can provide profound insights for revealing evolutionary processes of molecular sequences and unveiling complex molecular mechanisms of genome evolution [1]. In principle, estimating selection on DNA sequences requires a reference set of substitutions that is free from selection. As synonymous substitutions do not provoke amino acid changes due to the degeneracy of the genetic code, they are expected to be invisible to selection and thus widely used as a reference that reflects the neutral rate of evolution [2]. Consequently, the ratio of nonsynonymous substitution rate (Ka or dN) to synonymous substitution rate (Ks or dS), namely, ω = Ka/Ks (or dN/dS), is widely adopted to differentiate neutral mutation (ω ≈ 1) from negative (purifying) selection (ω < 1) and positive (adaptive) selection (ω > 1), accordingly providing a powerful tool for illuminating molecular evolution of coding sequences (see a popular package in [3]).

Nowadays, a growing body of evidence has shown that non-coding sequences, historically thought as “junk” due to few knowledge on their function relative to coding sequences, are recognized as functional elements to play important regulation roles in multiple biological processes [4] and associate closely with various human diseases [5–7]. Albeit less conserved by comparison with coding sequences, a larger number of non-coding sequences have been identified highly conserved across mammalian genomes [8–10]. Importantly, more non-coding sequences are subject to positive selection and negative selection than previously believed, and particularly, long non-coding RNA (lncRNA) sequences do experience natural selection [11]. As a result, several computational methods have been proposed for the detection of selection acting on non-coding sequences [12], which primarily differ in how to choose a reference of unconstrained evolution, such as, synonymous substitutions of neighboring coding gene [13], intron sequences [14,15], and ancestral repeats [16]. However, there lacks of an implemented algorithm to detect the strength and mode of selective pressure on non-coding sequences, particularly considering an increasing number of non-coding studies conducted worldwide. More importantly, an integrated toolkit that is capable of detecting selection on both coding and non-coding sequences is highly desirable, which would help users achieve genome-wide scan of natural selection on diverse sequences.

Toward this end, here we present KaKs_Calculator 3.0, an updated toolkit for calculating selective pressure on both coding and non-coding sequences. Compared with previous versions [17,18] that focus solely on coding sequences, we implement an algorithm in KaKs_Calculator 3.0 that employs synonymous sites of adjacent coding sequences as a reference to estimate selective pressure acting on non-coding sequences. We test it on empirical data and demonstrate its utility in diagnosing the strength and form of molecular evolution.

Algorithm

The major update of KaKs_Calculator 3.0 is to incorporate an algorithm that is capable of estimating selective pressure on non-coding sequences. Specifically, it uses synonymous substitutions as a reference baseline (similar to [13]), which, albeit thought to be under weak selection [19–21], has been widely adopted for determining the strength and type of selection operated on coding sequences [22–29]. Similar to the Ka/Ks ratio for coding sequences, selective pressure on non-coding sequences (ξ) can be quantified as the ratio of non-coding nucleotide substitution rate (Kn) to neutral substitution rate (assumed as Ks), viz. ξ = Kn/Ks, where Ks is inferred from adjacent coding sequences. As the number of observed substitutions is less than the number of real substitutions, we adopt a nucleotide substitution model (e.g., JC/K2P/HKY) to correct multiple substitutions of non-coding sequences. Taking the HKY model [30] as an example, therefore, Kn can be deduced from the observed transitional and transversional substitutions (S and V, respectively) as well as four nucleotide frequencies (πA, πT, πG, and πC) , according to Equation (1) (see Equations 1.27 and 1.28 in [31]).

Kn=2(πTπCπY+πAπGπR)a-2(πTπCπRπY+πAπGπYπR-πYπR)b
(1)

where a=-log[1-S2πTπC/πY+πAπG/πR-πTπCπR/πY+πAπGπY/πRV2πTπCπR+πAπGπY], b=-log(1-V2πYπR), πR = πA + πG, and πY = πT + πC. To detect and quantify selection on non-coding sequences, KaKs_Calculator 3.0 provides users with two ways to obtain the value of neutral mutation rate or Ks, which is either calculated from adjacent coding sequences uploaded by users or just specified in a straightforward manner by users (Figure 1). As a consequence, KaKs_Calculator 3.0 is capable of detecting selection on both coding and non-coding sequences.

Figure 1

Graphical user interface of KaKs_Calculator 3.0

It contains two panels that are devised for CDS and NCS, respectively. Methods for detecting selection on CDS are classified as: 1) approximate methods: NG by Nei et al. [23], LWL by Li et al. [22], LPB by Li [24] and Pamilo et al. [29], MLWL and MLPB by Tzeng et al. [28], YN by Yang et al. [26], MYN by Zhang et al. [27]; 2) maximum-likelihood methods: GY by Goldman et al. [25], and MS and MA by Zhang et al. [17]. Ka, nonsynonymous substitution rate; Ks, synonymous substitution rate; Kn, non-coding nucleotide substitution rate; Ka/Ks, selective pressure on CDS; Kn/Ks, selective pressure on NCS; CDS, coding sequence; NCS, non-coding sequence; MLWL, Modified LWL; MLPB, Modified LPB; MYN, Modified YN; MS, Model Selection; MA, Model Averaging.

KaKs_Calculator 3.0 is implemented in standard C++ language, enabling higher efficiency and easy compilation on different operation systems (Linux/Windows/Mac). In addition to the new functionality for estimating selection on non-coding sequences as mentioned above, it is also updated by fixing bugs and errors. The package of KaKs_Calculator 3.0, including compiled executables, a Windows application with graphical user interface (GUI), source codes, and example data, accompanying with detailed instructions and documentation, is freely available for academic use only at BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT000001), an open-source platform for archiving bioinformatics tools in the National Genomics Data Center (NGDC) [32], China National Center for Bioinformation.

Application on empirical data

To test KaKs_Calculator 3.0, we choose three empirical lncRNA genes that are extensively studied according to LncRNAWiki [7] and collect their human–mouse orthologs as well as their adjacent coding orthologs from NGDC LncBook [33] and National Center of Biotechnology Information (NCBI) RefSeq [34]. Specifically, these non-coding and coding gene symbols with accession numbers are: 1) H19 (NR_002196.2 vs. NR_130973.1) and MRPL23 (NM_021134.4 vs. NM_011288.2); 2) Metastasis-associated lung adenocarcinoma transcript 1 (MALAT1; NR_002819.4 vs. NR_002847.3) and SCYL1 (NM_020680.4 vs. NM_001361921.1); and 3) Hox transcript antisense intergenic RNA (HOTAIR; NR_003716.3 vs. NR_047528.1) and HOXC12 (NM_173860.3 vs. NM_010463.2). Based on these orthologous genes, we obtain their corresponding aligned sequences by MAFFT [35] (using parameters: --maxiterate 1000 --localpair).

According to the ratio (ξ) of non-coding nucleotide substitution rate to adjacent synonymous substitution rate, we reveal that, although the coding genes undergo strong purifying selection (ω < 1), these three non-coding genes present diverse selective pressure (Table 1). Strikingly, HOTAIR exhibits positive selection (ξ > 1), whereas the rest two genes experience negative selection (ξ < 1). HOTAIR is a ∼ 2.3-kb intergenic RNA transcribed from the antisense strand of the HOXC gene cluster [36]. The result of positive selection detected on HOTAIR relative to HOXC12 is consistent well with previous findings that HOTAIR evolves faster than the neighboring genes [37]. On the contrary, MALAT1, a ∼ 8.7-kb non-coding RNA flanked by the highly conserved kinase-like gene SCYL1, is ubiquitously expressed in almost all human tissues, evolutionarily conserved across mammalian species [38], and associated with various cancers [39]. Thus, ξ = 0.464 indicates strong selective constraint on MALAT1, in accordance with its physiologic and pathophysiological function [40] and conserved RNA structure [41] as documented by previous studies. Likewise, H19, a ∼ 2.3-kb imprinted maternally expressed transcript located near MRPL23, is known for close association with Beckwith-Wiedemann Syndrome and also involved in tumorigenesis [42]. Our result shows that H19 presents stronger selection constraint as indicated by ξ = 0.296, conforming well with its conserved sequence and structure [43]. It is worth noting that one non-coding sequence may have multiple adjacent coding genes, which are specified by users and thus can lead to different estimates of Ks and ξ. Taken together, KaKs_Calculator 3.0 is effective in estimating natural selection on non-coding sequences, which has the potential to reveal evolutionarily selective pressures operated on diverse molecular sequences.

Table 1

Estimates of selective pressure as well as substitution rates in human – mouse orthologs

Non-codingCoding
Gene symbolKnξ = Kn/KsGene symbolKaKsω = Ka/Ks
H190.3400.296MRPL230.0881.1500.077
MALAT10.3240.464SCYL10.0400.6970.058
HOTAIR0.5441.114HOXC120.0200.4880.041
Non-codingCoding
Gene symbolKnξ = Kn/KsGene symbolKaKsω = Ka/Ks
H190.3400.296MRPL230.0881.1500.077
MALAT10.3240.464SCYL10.0400.6970.058
HOTAIR0.5441.114HOXC120.0200.4880.041

Note: Ka, nonsynonymous substitution rate; Ks, synonymous substitution rate; Kn, non-coding nucleotide substitution rate; ω, selective pressure on coding sequence; ξ, selective pressure on non-coding sequence.

Table 1

Estimates of selective pressure as well as substitution rates in human – mouse orthologs

Non-codingCoding
Gene symbolKnξ = Kn/KsGene symbolKaKsω = Ka/Ks
H190.3400.296MRPL230.0881.1500.077
MALAT10.3240.464SCYL10.0400.6970.058
HOTAIR0.5441.114HOXC120.0200.4880.041
Non-codingCoding
Gene symbolKnξ = Kn/KsGene symbolKaKsω = Ka/Ks
H190.3400.296MRPL230.0881.1500.077
MALAT10.3240.464SCYL10.0400.6970.058
HOTAIR0.5441.114HOXC120.0200.4880.041

Note: Ka, nonsynonymous substitution rate; Ks, synonymous substitution rate; Kn, non-coding nucleotide substitution rate; ω, selective pressure on coding sequence; ξ, selective pressure on non-coding sequence.

In addition, to test the running performance of KaKs_Calculator, we collect an empirical large dataset that contains 15,424 human–mouse orthologous genes retrieved from RefSeq [34] and obtain their codon-based alignments by ParaAT [44] — a parallel tool for constructing multiple protein-coding DNA alignments. KaKs_Calculator 3.0 includes ten computational methods for detecting selection on coding sequences, which fall into approximate methods and maximum-likelihood methods. We choose three approximate methods, NG [23], YN [26], and MYN [27], and one maximum-likelihood method, GY [25], and test on a 64 bit x86 Intel Core i7 machine containing 4 CPU cores with each 3.40 GHz and running Windows 10. For this large-scale data analysis, we find that NG, YN, and MYN all take ∼ 2 min and GY takes ∼ 11 h, clearly showing that approximate methods are more time-efficient than maximum-likelihood ones. Considering that different users may have different preferences, it should be noted, however, that maximum-likelihood methods are believed to achieve higher accuracy and that different methods adopt different models and strategies and thus can lead to different estimates [45] (see an example in [46] where contradictory findings are produced by different methods).

Discussion

KaKs_Calculator 3.0 is significantly updated by achieving the detection of natural selection on non-coding sequences as well as coding sequences. As testified on empirical data, it is of great utility in calculating natural selection on molecular sequences, thus identifying potentially functional elements at a genome-wide scale. Future developments include the detection of selective pressure on small peptides (less than 300 nucleotides) that are encoded by small open reading frames within non-coding sequences [47–49] as well as the implementation of codon-based alignment procedure to help users generate input sequences in an easy-to-use manner.

Code availability

KaKs_Calculator 3.0 is freely available for academic use only at https://ngdc.cncb.ac.cn/biocode/tools/BT000001.

CRediT authorstatement

Zhang Zhang: Conceptualization, Methodology, Software, Writing - original draft, Writing - review & editing, Funding acquisition, Supervision. The author has read and approved the final manuscript.

Competing interests

The author has declared no competing interests.

Peer review under responsibility of Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China.

Acknowledgments

I would like to extend special thanks to Lina Ma for constructive suggestions and discussions on this work and Zhao Li for valuable help on data collection and test. I also thank Zhuojing Fan for designing the logo as well as Qing Guo and Lin Dai for fixing a bug on Windows GUI. I am extremely grateful to a number of users for reporting bugs and sending comments since the first release of KaKs_Calculator in 2006. This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDA19050302), the National Natural Science Foundation of China (Grant Nos. 31871328 and 32030021), the National Key R&D Program of China (Grant No. 2017YFC0907502), and the International Partnership Program of the Chinese Academy of Sciences (Grant No. 153F11KYSB20160008).

References

1

Li
 
W.H.
 
Molecular evolution
.
Sunderland (Massachusetts)
:
Sinauer Associates
,
1997
.

2

Hurst
 
L.D.
 
The Ka/Ks ratio: diagnosing the form of sequence evolution
.
Trends Genet
 
2002
;
18
:
486
-
487
.

3

Yang
 
Z.
 
PAML 4: phylogenetic analysis by maximum likelihood
.
Mol Biol Evol
 
2007
;
24
:
1586
-
1591
.

4

Dunham
 
I.
,
Kundaje
 
A.
,
Aldred
 
S.F.
,
Collins
 
P.J.
,
Davis
 
C.A.
,
Doyle
 
F.
 et al.   
An integrated encyclopedia of DNA elements in the human genome
.
Nature
 
2012
;
489
:
57
-
74
.

5

Luo
 
H.
,
Bu
 
D.
,
Shao
 
L.
,
Li
 
Y.
,
Sun
 
L.
,
Wang
 
C.
 et al.   
Single-cell long non-coding RNA landscape of T cells in human cancer immunity
.
Genomics Proteomics Bioinformatics
 
2021
;
19
:
377
-
393
.

6

Anastasiadou
 
E.
,
Jacob
 
L.S.
,
Slack
 
F.J.
 
Non-coding RNA networks in cancer
.
Nat Rev Cancer
 
2018
;
18
:
5
-
18
.

7

Liu
 
L.
,
Li
 
Z.
,
Liu
 
C.
,
Zou
 
D.
,
Li
 
Q.
,
Feng
 
C.
 et al.   
LncRNAWiki 2.0: a knowledgebase of human long non-coding RNAs with enhanced curation model and database system
.
Nucleic Acids Res
 
2022
;
50
:
D190
-
D195
.

8

Bejerano
 
G.
,
Pheasant
 
M.
,
Makunin
 
I.
,
Stephen
 
S.
,
Kent
 
W.J.
,
Mattick
 
J.S.
 et al.   
Ultraconserved elements in the human genome
.
Science
 
2004
;
304
:
1321
-
1325
.

9

Habic
 
A.
,
Mattick
 
J.S.
,
Calin
 
G.A.
,
Krese
 
R.
,
Konc
 
J.
,
Kunej
 
T.
 
Genetic variations of ultraconserved elements in the human genome
.
OMICS
 
2019
;
23
:
549
-
559
.

10

Guttman
 
M.
,
Amit
 
I.
,
Garber
 
M.
,
French
 
C.
,
Lin
 
M.F.
,
Feldser
 
D.
 et al.   
Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals
.
Nature
 
2009
;
458
:
223
-
227
.

11

Diederichs
 
S.
 
The four dimensions of noncoding RNA conservation
.
Trends Genet
 
2014
;
30
:
121
-
123
.

12

Zhen
 
Y.
,
Andolfatto
 
P.
 
Methods to detect selection on noncoding DNA
.
Methods Mol Biol
 
2012
;
856
:
141
-
159
.

13

Wong
 
W.S.W.
,
Nielsen
 
R.
 
Detecting selection in noncoding regions of nucleotide sequences
.
Genetics
 
2004
;
167
:
949
-
958
.

14

Parsch
 
J.
,
Novozhilov
 
S.
,
Saminadin-Peter
 
S.S.
,
Wong
 
K.M.
,
Andolfatto
 
P.
 
On the utility of short intron sequences as a reference for the detection of positive and negative selection in Drosophila
.
Mol Biol Evol
 
2010
;
27
:
1226
-
1234
.

15

Hoffman
 
M.M.
,
Birney
 
E.
 
Estimating the neutral rate of nucleotide substitution using introns
.
Mol Biol Evol
 
2007
;
24
:
522
-
531
.

16

Bush
 
E.C.
,
Lahn
 
B.T.
 
A genome-wide screen for noncoding elements important in primate evolution
.
BMC Evol Biol
 
2008
;
8
:
17
.

17

Zhang
 
Z.
,
Li
 
J.
,
Zhao
 
X.Q.
,
Wang
 
J.
,
Wong
 
G.S.
,
Yu
 
J.
 
KaKs_Calculator: calculating Ka and Ks through model selection and model averaging
.
Genomics Proteomics Bioinformatics
 
2006
;
4
:
259
-
263
.

18

Wang
 
D.
,
Zhang
 
Y.
,
Zhang
 
Z.
,
Zhu
 
J.
,
Yu
 
J.
 
KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies
.
Genomics Proteomics Bioinformatics
 
2010
;
8
:
77
-
80
.

19

Shabalina
 
S.A.
,
Spiridonov
 
N.A.
,
Kashina
 
A.
 
Sounds of silence: synonymous nucleotides as a key to biological regulation and complexity
.
Nucleic Acids Res
 
2013
;
41
:
2073
-
2094
.

20

Hershberg
 
R.
,
Petrov
 
D.A.
 
Selection on codon bias
.
Annu Rev Genet
 
2008
;
42
:
287
-
299
.

21

Plotkin
 
J.B.
,
Kudla
 
G.
 
Synonymous but not the same: the causes and consequences of codon bias
.
Nat Rev Genet
 
2011
;
12
:
32
-
42
.

22

Li
 
W.H.
,
Wu
 
C.I.
,
Luo
 
C.C.
 
A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes
.
Mol Biol Evol
 
1985
;
2
:
150
-
174
.

23

Nei
 
M.
,
Gojobori
 
T.
 
Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions
.
Mol Biol Evol
 
1986
;
3
:
418
-
426
.

24

Li
 
W.H.
 
Unbiased estimation of the rates of synonymous and nonsynonymous substitution
.
J Mol Evol
 
1993
;
36
:
96
-
99
.

25

Goldman
 
N.
,
Yang
 
Z.
 
A codon-based model of nucleotide substitution for protein-coding DNA sequences
.
Mol Biol Evol
 
1994
;
11
:
725
-
736
.

26

Yang
 
Z.
,
Nielsen
 
R.
 
Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models
.
Mol Biol Evol
 
2000
;
17
:
32
-
43
.

27

Zhang
 
Z.
,
Li
 
J.
,
Yu
 
J.
 
Computing Ka and Ks with a consideration of unequal transitional substitutions
.
BMC Evol Biol
 
2006
;
6
:
44
.

28

Tzeng
 
Y.H.
,
Pan
 
R.
,
Li
 
W.H.
 
Comparison of three methods for estimating rates of synonymous and nonsynonymous nucleotide substitutions
.
Mol Biol Evol
 
2004
;
21
:
2290
-
2298
.

29

Pamilo
 
P.
,
Bianchi
 
N.O.
 
Evolution of the Zfx and Zfy genes: rates and interdependence between the genes
.
Mol Biol Evol
 
1993
;
10
:
271
-
281
.

30

Hasegawa
 
M.
,
Kishino
 
H.
,
Yano
 
T.A.
 
Dating of the human-ape splitting by a molecular clock of mitochondrial DNA
.
J Mol Evol
 
1985
;
22
:
160
-
174
.

31

Yang
 
Z.H.
 
Computational molecular evolution
.
London
:
Oxford University Press
,
2006
.

32

CNCB-NGDC Members and Partners
 
Database resources of the National Genomics Data Center, China National Center for Bioinformation in 2022
.
Nucleic Acids Res
 
2022
;
50
:
D27
-
D38
.

33

Ma
 
L.
,
Cao
 
J.
,
Liu
 
L.
,
Du
 
Q.
,
Li
 
Z.
,
Zou
 
D.
 et al.   
LncBook: a curated knowledgebase of human long non-coding RNAs
.
Nucleic Acids Res
 
2019
;
47
:
D128
-
D134
.

34

Li
 
W.
,
O’Neill
 
K.R.
,
Haft
 
D.H.
,
DiCuccio
 
M.
,
Chetvernin
 
V.
,
Badretdin
 
A.
 et al.   
RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation
.
Nucleic Acids Res
 
2021
;
49
:
D1020
-
D1028
.

35

Katoh
 
K.
,
Standley
 
D.M.
 
MAFFT multiple sequence alignment software version 7: improvements in performance and usability
.
Mol Biol Evol
 
2013
;
30
:
772
-
780
.

36

Tang
 
Q.
,
Hann
 
S.
 
HOTAIR: an oncogenic long non-coding RNA in human cancer
.
Cell Physiol Biochem
 
2018
;
47
:
893
-
913
.

37

He
 
S.
,
Liu
 
S.
,
Zhu
 
H.
 
The sequence, structure and evolutionary features of HOTAIR in mammals
.
BMC Evol Biol
 
2011
;
11
:
102
.

38

Gutschner
 
T.
,
Hämmerle
 
M.
,
Diederichs
 
S.
 
MALAT1 – a paradigm for long noncoding RNA function in cancer
.
J Mol Med (Berl)
 
2013
;
91
:
791
-
801
.

39

Meseure
 
D.
,
Vacher
 
S.
,
Lallemand
 
F.
,
Alsibai
 
K.D.
,
Hatem
 
R.
,
Chemlali
 
W.
 et al.   
Prognostic value of a newly identified MALAT1 alternatively spliced transcript in breast cancer
.
Br J Cancer
 
2016
;
114
:
1395
-
1404
.

40

Zhang
 
X.
,
Hamblin
 
M.H.
,
Yin
 
K.J.
 
The long noncoding RNA Malat1: its physiological and pathophysiological functions
.
RNA Biol
 
2017
;
14
:
1705
-
1714
.

41

Smith
 
M.A.
,
Gesell
 
T.
,
Stadler
 
P.F.
,
Mattick
 
J.S.
 
Widespread purifying selection on RNA structure in mammals
.
Nucleic Acids Res
 
2013
;
41
:
8220
-
8236
.

42

Hurst
 
L.D.
,
Smith
 
N.G.
 
Molecular evolutionary evidence that H19 mRNA is functional
.
Trends Genet
 
1999
;
15
:
134
-
135
.

43

Juan
 
V.
,
Crain
 
C.
,
Wilson
 
C.
 
Evidence for evolutionarily conserved secondary structure in the H19 tumor suppressor RNA
.
Nucleic Acids Res
 
2000
;
28
:
1221
-
1227
.

44

Zhang
 
Z.
,
Xiao
 
J.
,
Wu
 
J.
,
Zhang
 
H.
,
Liu
 
G.
,
Wang
 
X.
 et al.   
ParaAT: a parallel tool for constructing multiple protein-coding DNA alignments
.
Biochem Biophys Res Commun
 
2012
;
419
:
779
-
781
.

45

Zhang
 
Z.
,
Yu
 
J.
 
Evaluation of six methods for estimating synonymous and nonsynonymous substitution rates
.
Genomics Proteomics Bioinformatics
 
2006
;
4
:
173
-
181
.

46

Li
 
J.
,
Zhang
 
Z.
,
Vang
 
S.
,
Yu
 
J.
,
Wong
 
G.S.
,
Wang
 
J.
 
Correlation between Ka/Ks and Ks is related to substitution model and evolutionary lineage
.
J Mol Evol
 
2009
;
68
:
414
-
423
.

47

Choi
 
S.W.
,
Kim
 
H.W.
,
Nam
 
J.W.
 
The small peptide world in long noncoding RNAs
.
Brief Bioinform
 
2019
;
20
:
1853
-
1864
.

48

Li
 
Q.
,
Li
 
Z.
,
Feng
 
C.
,
Jiang
 
S.
,
Zhang
 
Z.
,
Ma
 
L.
 
Multi-omics annotation of human long non-coding RNAs
.
Biochem Soc Trans
 
2020
;
48
:
1545
-
1556
.

49

Li
 
Y.
,
Zhou
 
H.
,
Chen
 
X.
,
Zheng
 
Y.
,
Kang
 
Q.
,
Hao
 
D.
 et al.   
SmProt: a reliable repository with comprehensive annotation of small proteins identified from ribosome profiling
.
Genomics Proteomics Bioinformatics
 
2021
;
19
:
602
-
610
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 license (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. You are not required to obtain permission to reuse this article.