Data generation and network reconstruction strategies for single cell transcriptomic profiles of CRISPR-mediated gene perturbations

https://doi.org/10.1016/j.bbagrm.2019.194441Get rights and content

Highlights

  • Recent single-cell RNA technologies combined with CRISPR/Cas9 enable large scale perturbation studies linked to transcriptional readouts.

  • Discussion of the key challenges of network reconstruction on these data and how to overcome them

  • Overview of current network reconstruction approaches, including their advantages and limitations

  • Curated collection of key data for the development of analysis strategies

Abstract

Recent advances in single-cell RNA-sequencing (scRNA-seq) in combination with CRISPR/Cas9 technologies have enabled the development of methods for large-scale perturbation studies with transcriptional readouts. These methods are highly scalable and have the potential to provide a wealth of information on the biological networks that underlie cellular response.

Here we discuss how to overcome several key challenges to generate and analyse data for the confident reconstruction of models of the underlying cellular network. Some challenges are generic, and apply to analysing any single-cell transcriptomic data, while others are specific to combined single-cell CRISPR/Cas9 data, in particular barcode swapping, knockdown efficiency, multiplicity of infection and potential confounding factors. We also provide a curated collection of published data sets to aid the development of analysis strategies.

Finally, we discuss several network reconstruction approaches, including co-expression networks and Bayesian networks, as well as their limitations, and highlight the potential of Nested Effects Models for network reconstruction from scRNA-seq data.

This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.

Introduction

Gene perturbation experiments have been fundamental to establishing the roles of individual genes in the cell. Perturbation studies in models from yeast [1] to mouse [2] build on a long tradition of probing gene function and cellular pathways with single gene knockouts [3,4] and combinatorial knockouts [5,6,7]. Many aspects of cancer research depend on mouse models with genetically engineered gene perturbations [8]. The development of RNAi [9,10,11] and more recently CRISPR/Cas9 screening [12,13,14,15] have enabled the generation of a wealth of data linking gene editing to outcome in a wide range of models. With the CRISPR/Cas9 system, edits to the genome can now be targeted to specific loci much more accurately, reliably and quickly compared to older targeting technologies [16].

The richer the phenotypic readout of a perturbation, the more informative the experiment will be. Binary readouts like cell viability versus death only provide a crude summary picture of cellular pathways [17], while high-dimensional transcriptomic readouts provide a much more refined picture. With technological advances in sequencing, microarray readouts [18] were replaced with RNA-seq, and most recently it has become possible to use transcriptional profiles of single-cells as a readout, increasing the resolution of transcriptomic phenotypes even further [19,20,21].

Combining single-cell sequencing methods with CRISPR/Cas9 [21,19,20] has lead to the generation of large data sets that describe the transcriptional response to the knockout of large numbers of individual genes in individual cells. It is now feasible to conduct these studies at scale, i.e. targeting 1000s of genes, which in turn has enabled the expansion of screening approaches previously limited to organisms such as S. cerevisiae to be applied to a much more diverse range of biological systems, including both to mammalian cell lines and in vivo mouse models of disease [15,22]. As a result, it is an open challenge how to best use perturbation data to understand how cellular networks are wired [3].

A simple analogy might make the underlying reasoning more approachable. Deriving functional cellular interaction networks from experimental perturbation is like trying to understand how the systems of a car function by removing individual parts (Fig. 1). Imagine removing brake fluid, pads, or cables from a car. These perturbations will give the same result: the car will no longer stop when the driver needs it to. On the other hand, if one were to remove the spark plugs or distributor, it will fail to achieve the combustion needed to drive it. The relationships of these parts and the effect that the perturbation has would lead one to correctly infer two different systems in the car. Biological network reconstruction methods aim to achieve a similar result from data generated that describes the cellular state.

The analogy in Fig. 1 assumed that the effects of perturbing one part, e.g. the cables, were observable at all other parts of the car. In biological systems, the data that is closest to such a comprehensive description of the cell are gene expression profiles. In response to a perturbation, expression profiles show which genes have changed their activity, which is the biological version of Fig. 1A. However, until recently, undertaking large scale gene expression studies has been a costly endeavour. Using bulk sequencing technologies, one must undertake a knockdown experiment for every node within the network and include multiple biological replicates to ensure robustness of the model. This means a 20-node network with 5 replicates results in 100 individual samples. This has considerable cost implications in both the sequencing and in the reagents required for RNA interference of the genes targeted for knockdown.

The development of single-cell RNA-seq data containing perturbation information is making large scale studies more affordable. By targeting panels of key genes for knockout and measuring the transcriptional readout in individual cells, one can generate functional data on a system much more rapidly and at lower cost than could be achieved with bulk sequencing. Network reconstruction methods that can process the resulting data can then generate models that also encode the causality of the relationships within the graph.

In this review, we discuss how to use computational network reconstruction methods to best leverage perturbation data for a refined understanding of gene function and cellular pathways. We build on our experience as a combined experimental and computational research group to provide the reader with a single starting point on how to go all the way from conception, through data generation and analysis, to network building. Some of the challenges in the data are generic to single-cell sequencing data (Section 2), others are specific to gene perturbations (Section 4). For network inference on these data, we will make the major distinction between (i) methods that assume that transcriptional effects of perturbations are visible at other pathway members, and (ii) methods that assume that effects are mostly observed downstream of a pathway. Examples of the first approach are regression methods and Bayesian networks, while the main representative of the second approach are Nested Effect Models (Section 5).

Section snippets

General challenges in the analysis of droplet-based scRNA-seq data

A detailed comparison of recent single-cell RNA-sequencing methods, as of 2018, is provided by AlJanahi et al. [23], while an in-depth review of best practices for scRNA-seq data analysis has previously been published by Luecken and Theis [24]. Therefore, for the purpose of this review we will focus on the droplet-based scRNA-seq methods which underlie current CRISPR/Cas9-coupled single cell methods.

The principle behind droplet-based single-cell sequencing technologies, including the 10x

Adapting CRISPR/Cas9 to single cell studies

The methods described in the last section represent a comprehensive toolbox to analyse and visualise transcriptomic profiles of single cells. In the following sections we will discuss how to use these transcriptomic profiles in perturbation studies based on genome editing by the CRISPR/Cas9 system.

Specific challenges of scRNA-seq coupled CRISPR/Cas9 mediated perturbation

Using single cell transcriptomic profiles as a readout for the effect of CRISPR/Cas9 mediated perturbation results in all the general challenges of analysing scRNA-seq data (see Section 2) as well as additional specific challenges.

Network reconstruction methods

The last sections described how single cell perturbation data can be obtained. In the following sections we turn our attention to the network reconstruction methods needed to infer cellular wiring diagrams from these data. Network inference from gene expression data has a history of at least 20 years [68], starting with the analysis of microarray data, then bulk RNA-seq, and most recently, single-cell RNA-seq profiling. The basic underlying concepts of network reconstruction have stayed the

Conclusion

The growth of the single-cell field over the last few years has provided powerful methods for illuminating heterogeneity in cell populations and resulted in a better understanding of cellular biology. Coupling these methods with CRISPR genome editing has resulted in recent advances in the ability to rapidly and accurately undertake gene knockouts and generate the rich data necessary for network inference. The field continues to progress rapidly, with the more recent applications of dCas9-based

Transparency document

Transparency document

Acknowledgements

HVC undertook the package development and data analysis. ANH, HVC, FM wrote the manuscript. All authors read and approved the final manuscript.

Glossary

Batch effect
Technical sources of variation that are as a result of samples preparation.
CRISPR
Clustered regularly interspaced short palindromic repeats.
E-gene
In a Nested Effect Model an E-gene is a gene within the signalling pathway whose transcript shows an effect after the perturbation of an S-gene.
Gene Knockdown
The interference with a gene to reduce its expression, and thereby reduce levels of the protein.
Gene Knockout
The distribution of a gene so that is no longer translated into protein.

References (1)

    Cited by (0)

    View full text