Prediction of siRNA knockdown efficiency using artificial neural network models

doi:10.1016/j.bbrc.2005.08.147

Biochemical and Biophysical Research Communications

Volume 336, Issue 2, 21 October 2005, Pages 723-728

https://doi.org/10.1016/j.bbrc.2005.08.147 Get rights and content

Abstract

Selective knockdown of gene expression by short interference RNAs (siRNAs) has allowed rapid validation of gene functions and made possible a high throughput, genome scale approach to interrogate gene function. However, randomly designed siRNAs display different knockdown efficiencies of target genes. Hence, various prediction algorithms based on siRNA functionality have recently been constructed to increase the likelihood of selecting effective siRNAs, thereby reducing the experimental cost. Toward this end, we have trained three Back-propagation and Bayesian neural network models, previously not used in this context, to predict the knockdown efficiencies of 180 experimentally verified siRNAs on their corresponding target genes. Using our input coding based primarily on RNA structure thermodynamic parameters and cross-validation method, we showed that our neural network models outperformed most other methods and are comparable to the best predicting algorithm thus far published. Furthermore, our neural network models correctly classified 74% of all siRNAs into different efficiency categories; with a correlation coefficient of 0.43 and receiver operating characteristic curve score of 0.78, thus highlighting the potential utility of this method to complement other existing siRNA classification and prediction schemes.

Section snippets

Materials and methods

Neural network models. Artificial neural network is built on a set of interconnected neural units and consists of one input and one output layer that takes the input values and outputs the final output result individually. Some of them have one or more hidden layers which perform nonlinear modeling (Fig. 1).

There are many different types of neural networks. Each differs from the others in network topology and/or learning algorithm. In this study, we introduce the back-propagation, general

Network parameter optimization

Several important parameters affect neural network structure configuration and performance. In back-propagation neural network, these parameters include training time, the number of units in hidden layer, learning rate, and momentum. Training time is measured in epoch. One epoch is equivalent to presenting all patterns to the network once. Long training time increases the possibility of over-fitting the training set: the error of training set will get lower as the training time gets longer and

Discussion

Two main encoding methods are used in previous studies to facilitate the selection of effective siRNA: energy-based [11] and sequence-feature based [9], [10], [25] methods. Some studies combined both features [5], [6], [7]. Most of these studies attempted to discover the correlation between the functionality of siRNAs and their specific sequence motif or base preference. Due to differences in experimental setting such as target transcript sequence and the relatively small dataset, different

Acknowledgment

G.W.W. is supported by the NIH NRSA postdoctorate fellowship (5F32DK 067835-02).

References (26)

A. Nykanen et al.
ATP requirement and small interfering RNA structure in the RNA interference pathway
Cell
(2001)
M. Amarzguioui et al.
An algorithm for selection of functional siRNA sequences
Biochem. Biophys. Res. Commun.
(2004)
A.M. Chalk et al.
Improved and automated prediction of effective siRNA
Biochem. Biophys. Res. Commun.
(2004)
R. Teramoto et al.
Prediction of siRNA functionality using generalized string kernel and support vector machine
FEBS
(2005)
D. Specht
Probabilistic neural networks
Neural Netw.s
(1990)
Y. Freund et al.
A decision-theoretic generalization of on-line learning and an application to boosting
J. Comput. System Sci.
(1997)
T. Vickers et al.
Efficient reduction of target RNAs by small interfering RNA and RNase H-dependent antisense agents
J. Biol. Chem.
(2003)
R. Teramoto et al.
Prediction of siRNA functionality using generalized string kernel and support vector machine
FEBS lett.
(2005)
P.A. Sharp
RNA interence-2001
Genes Dev.
(2001)
A. Fire et al.
Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans
Nature
(1998)

K. Berns et al.

A large-scale RNAi screen in human cells identifies new components of P53 pathway

Nature

(2004)

A. Reynolds et al.

Rational siRNA design for RNA interence

Nat. Biotechnol.

(2004)

K. Ui-Tei et al.

Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference

Nucleic Acids Res.

(2004)

Cited by (18)

Development of in silico methodology for siRNA lipid nanoparticle formulations
2022, Chemical Engineering Journal
Citation Excerpt :
The binary classification method was used in this research. The classification standard was 50% knockdown efficiency, which is a common acceptable value [22,23]. There are 46 effective formulations and 83 ineffective formulations for the in vitro model, while there are 113 effective formulations and 188 ineffective for the in vivo model on this assignment.
Small interfering RNA (siRNA) gene silencing therapy has great potential for treating multiple diseases. The lipid nanoparticle (LNP) technology for siRNA delivery succussed in clinical treatment. However, the formulation design of siRNA-LNP still faces enormous challenges. Current research aims to develop an integrated computer methodology for the rational design of siRNA-LNP formulations. The machine learning (ML) algorithm lightGBM was built to predict the knockdown efficiency of siRNA-LNP in vitro and in vivo delivery and reached good accuracy with 80% and 78.89% in the validation set. Further siRNA experiments well validated the ML model. Moreover, molecular dynamic (MD) simulation was utilized to investigate the molecular structure of siRNA-LNP. In conclusion, a novel integrated computer methodology based on ML, experimental, and MD simulation was successfully developed for siRNA-LNP formulation design.
Selecting highly effective siRNAs by their modified entropies with mini-clusters
2012, Theoretical and Applied Fracture Mechanics
Citation Excerpt :
The results show that type-1 (or type-2) entropy with mini-clusters is reliable for identifying effective siRNAs. To compare our results to existing siRNA-based design tools, three different algorithms [17–19] are applied to the data set of siRNAs. In fact, exceed 10% of effective siRNAs were not identified by any algorithm.
Synthetic sources of double-stranded RNA can enter the RNAi pathway at various points. The most basic approach involves transfection of effective siRNA duplexes that resemble Dicer products. Synthetic sources of double-stranded RNA can enter the RNAi pathway at various points. The most basic approach involves transfection of siRNA duplexes that resemble Dicer products. The approaches of designing effective siRNAs are classified into to two groups: the score-based algorithms and the machine learning classification algorithms. The first group approaches, which focus on finding the common features of effective siRNAs, though they initially and intuitively provide guidelines for siRNAs design, are far from satisfied due to low sensitivity and specificity. The other approaches are motivated by statistical learning theory, attempt to classify the siRNA into effective or ineffective class. Although those two-class classifiers provide a promising way to screen potentially effective siRNAs, it is difficult to decide the boundary between the two classes.
A novel method of distinguishing effective siRNAs is reported that combines the advantages of the score-based algorithms and the machine learning classification algorithms, where it use the modified entropies of siRNAs as feature indicator of siRNAs and split siRNAs into many smaller effective or ineffective subclasses by a mini-clusters algorithm. When the modified entropies with mini-clusters algorithm apply to experimental siRNAs data, all effective siRNAs can be identified correctly, and no more than 17% ineffective siRNAs are misidentified as effective ones.
RFRCDB-siRNA: Improved design of siRNAs by random forest regression model coupled with database searching
2007, Computer Methods and Programs in Biomedicine
Although the observations concerning the factors which influence the siRNA efficacy give clues to the mechanism of RNAi, the quantitative prediction of the siRNA efficacy is still a challenge task. In this paper, we introduced a novel non-linear regression method: random forest regression (RFR), to quantitatively estimate siRNAs efficacy values. Compared with an alternative machine learning regression algorithm, support vector machine regression (SVR) and four other score-based algorithms [A. Reynolds, D. Leake, Q. Boese, S. Scaringe, W.S. Marshall, A. Khvorova, Rational siRNA design for RNA interference, Nat. Biotechnol. 22 (2004) 326–330; K. Ui-Tei, Y. Naito, F. Takahashi, T. Haraguchi, H. Ohki-Hamazaki, A. Juni, R. Ueda, K. Saigo, Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference, Nucleic Acids Res. 32 (2004) 936–948; A.C. Hsieh, R. Bo, J. Manola, F. Vazquez, O. Bare, A. Khvorova, S. Scaringe, W.R. Sellers, A library of siRNA duplexes targeting the phosphoinositide 3-kinase pathway: determinants of gene silencing for use in cell-based screens, Nucleic Acids Res. 32 (2004) 893–901; M. Amarzguioui, H. Prydz, An algorithm for selection of functional siRNA sequences, Biochem. Biophys. Res. Commun. 316 (2004) 1050–1058) our RFR model achieved the best performance of all. A web-server, RFRCDB-siRNA (http://www.bioinf.seu.edu.cn/siRNA/index.htm), has been developed. RFRCDB-siRNA consists of two modules: a siRNA-centric database and a RFR prediction system. RFRCDB-siRNA works as follows: (1) Instead of directly predicting the gene silencing activity of siRNAs, the service takes these siRNAs as queries to search against the siRNA-centric database. The matched sequences with the exceeding the user defined functionality value threshold are kept. (2) The mismatched sequences are then processed into the RFR prediction system for further analysis.
MACHINE LEARNING MODELING OF siRNA STRUCTURE-POTENCY RELATIONSHIP WITH APPLICATIONS AGAINST SARS-CoV-2 SPIKE GENE
2024, arXiv
Integrating Artificial Intelligence and Nanotechnology for Precision Cancer Medicine
2020, Advanced Materials
What parameters to consider and which software tools to use for target selection and molecular design of small interfering RNAs
2013, Methods in Molecular Biology

View all citing articles on Scopus

View full text

Prediction of siRNA knockdown efficiency using artificial neural network models

Abstract

Section snippets

Materials and methods

Network parameter optimization

Discussion

Acknowledgment

Cell

Biochem. Biophys. Res. Commun.

Biochem. Biophys. Res. Commun.

FEBS

Neural Netw.s

J. Comput. System Sci.

J. Biol. Chem.

FEBS lett.

RNA interence-2001

Genes Dev.

Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans

Nature

A large-scale RNAi screen in human cells identifies new components of P53 pathway

Nature

Rational siRNA design for RNA interence

Nat. Biotechnol.

Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference

Nucleic Acids Res.