Genes editing techniques

Before the appearance of the CRISPR technique, tools such as interference RNAs, ZFNs (zinc finger nucleases), and TALENs (Transcription activator-like effector nucleases) had already been developed and utilized in genes editing.

RNA interference (RNAi)

The RNA interference tool (RNAi) was the first to be developed and has become widely used for experimental use. However, it was only in 1998, thanks to the study carried out by Andrew Fire and Craig Mello [1], these researchers revealed that long double-stranded RNA molecules (dsRNA) containing part of the target gene were able to specifically inhibit the expression of this gene by decreasing the target mRNA. RNAi is a small molecule, its siRNA synthesis is cheap and the transfection methods (electroporation, lipofection) are quite efficient; these characteristics allow an increased number of cell types and organisms where this technique can be used. Furthermore, different researchers have already investigated the therapeutic potential of these molecules, the most prominent example being the use of the RNAi tool for the treatment of familial amyloidosis caused by transthyretin [2]. Despite some of the advantages of the RNAi technique mentioned above, it also has disadvantages that have been noticed over the years, one of which was discovered in 2007 by Krueger and colleagues [3] who observed that gene silencing was often not complete, thus producing a knockdown (not knockout) effect.

Zinc finger nucleases

Nucleases are another tool widely used in genomics that allows the manipulation and alteration of the nucleotide sequence in a specific way. Nucleases are capable of causing a break in the double strand of DNA, instigating repair pathways by NHEJ (non-homologous end joining) or (b) HDR (homology-directed repair) and enabling the insertion/deletion of bases and homologous recombination with the donor DNA. ZFNs were the first nucleases capable of causing an alteration in a supervised manner in a predetermined location in the genome and thanks to this, the insertion of the Il2RG gene in X-linked immunodeficiency models (SCID-X1) in vitro has already been successfully performed [4] and in vivo [5]. However, ZFN technology requires complex engineering in the design and assembly of zinc finger domains, in addition to specificity problems. Not all produced ZFNs have the ability to effectively cleave DNA, and possibly guanine-containing target sites in the 5’ region have a higher success rate [6].

Transcription activator-like effector nucleases (TALENs)

The genomic tool TALEN was described in 2009 by several researchers [7, 8] and presents several similarities with ZFNs, as both use the same FokI domain to cleave the target site. This tool has the ability to bind to DNA due to the presence of domains such as transcription activator-like effectors (TALEs) in its structural composition. These domains originate from a pathogen that attacks plants, the bacterium Xanthomonas spp [9]. However, numerous studies have shown that the editing efficiency of TALENs is relatively lower than that of CRISPR, with TALENs being approximately 1 to 60% and CRISPR (2.3 to 79%) [10,11,12,13,14].

There are numerous advantages in using CRISPR over the other genomic tools mentioned above. CRISPR is simple and easy to design the gRNA (only about 20 nucleotides that will compose the guide RNA and that will be complementary to the region to be targeted) [10]. In addition, CRISPR has a greater mutation efficiency and allows for the manipulation of more than one gene at the same time, thus making it possible to originate multiple mutations [15,16,17], without the off-target effects produced by TALENs and ZFNs [18, 19].

Contextualization

Currently, the technique of CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) has has been widely discussed, both in the scientific community and in the media in general. This is due to the fact that this technique is triggering a revolution in molecular genetics, because it allows, among other functions, genetic editing, that is, the ability to alter the nucleotide sequence. This alteration can be of different types: (I) deletion, insertion, or substitution of one or more nucleotides, (II) integration of genetic elements or (III) deletion of genetic elements. In addition, CRISPR can also be used for purposes other than gene editing, for example: (I) DNA tagging, (II) regulation of gene expression, (III) RNA cleavage, (IV) gene mapping, or (V) RNA tracking.

According to Timlin et al., 2018, the use of the CRISPR / Cas9 (CRISPR associated protein 9) technique makes it unnecessary to resort to protein engineering to develop a site-specific nuclease against a specific target DNA sequence. All it takes is the synthesis of a new piece of RNA, thus reducing the time required for the design and implementation of gene editing.

History

In 1987, the researcher Yoshizumi Ishino and his collaborators identified a locus (region) that aroused particular interest in the genome of Escherichia coli bacteria. This locus was organized in an unusual configuration with repeated sequences and intercalated spacer sequences and of unknown function [21]. In addition to researcher Yoshizumi Ishino, in 1993 and 2000, other researchers [22, 23, 24] also carried out the identification of these sequences in the genomes of different bacteria and Archaea. However, it was only in 2002 that the acronym CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) was formulated to refer to such sequences [25]. Subsequently, a set of genes physically very close to the CRISPR locus were also identified and named Cas genes (CRISPR associated genes) [25], genes that would be recognized as central elements in the functioning of the locus as a whole.

In 2005, three independent groups of researchers demonstrated that spacer sequences have an extrachromosomal origin, that is, they are sequences that are derived from plasmids or viruses. Later, it was also described that viruses are unable to successfully infect bacteria that have spacers whose sequences correspond to stretches of their genomes [26, 27, 28].

As a result of these findings, it was hypothesized that CRISPR-Cas would be an adaptive immune system of prokaryotes and in which spacers would serve as “memory of previous invasions” [27]. Thus, RNA molecules produced from these spacers would be complementary to the (re)invasive pathogen, making it possible to fight it in a sequence-specific manner. A series of subsequent studies revealed that this hypothesis was indeed correct. According to [29], this “memory” could be used by Cas proteins, helping as guided endonucleases, with the objective of scanning the invading DNA and deactivating it by introducing double-stranded breaks (DSBs).

The biological function of CRISPR-Cas

The CRISPR-Cas system is an adaptive defense system that allows prokaryotes to protect themselves against a possible (re)invasion of unwanted mobile genetic elements, such as bacteriophages, transposons, and plasmids [30]. In this system, immunity is mediated by Cas nucleases and small guide RNAs crRNAs (CRISPR-derived RNA), which distinguish the cleavage site within the invading genome. This system is made up of two main components: the Cas proteins, which act as catalysts, and the CRISPR locus, which functions as genetic memory. The defense mechanism consists of three steps: (I) adaptation, (II) crRNA biogenesis, and (III) action against the invader, as represented in Fig. 1.

Fig. 1
figure 1

(a) The three steps of defense mechanism: (I) adaptation, (II) crRNA biogenesis, and (III) Targeting. (b) The different types of CRISPR-Cas systems (Type I, Type II, and Type III) [36]

The first stage consists of a process of adaptation and is characterized by the acquisition of a new spacer at the CRISPR locus [31]. This process occurs, for example, when the bacteria is first infected by the virus. At this point, some enzymes encoded by the Cas genes (Cas1 and Cas2) cleave the pathogen’s DNA into small segments (24–48 base pairs) and integrates them into the CRISPR locus as new SPACERS (i.e., between the repeated sequences). From that moment on, the bacterium is immunized against future invasions by this same agent.

The second step, crRNA biogenesis, is the uninterrupted transcription of the CRISPR locus. This process is mediated by the lead sequence (L), a region with a high adenine-thymine content that serves as a promoter site. Transcription from this locus generates a “CRISPR RNA precursor” (or pre-crRNA), containing multiple repeat sequences and multiple spacers in a single long RNA. Subsequently, this pre-crRNA will be processed, giving rise to several smaller RNAs, the crRNAs, each corresponding to a different spacer [31].

In the third step, Targeting, mature crRNAs together with Cas proteins, form complexes that recognize the exogenous genetic sequence (plasmid, transposon, or virus) and destroy it [31] by introducing DSB [29]. This process recalls some similarities with the RNA Interference (RNAi) mechanism observed in eukaryotes [32, 33].

The system elements vary between different species, both in terms of occurrence, gene composition, sequences, number, and size [34]. For example, in the case of the composition of spacers, they can even vary between the same species, as they depend on previous contacts they have with different pathogens. Researches have revealed the importance of this mechanism for applications, namely: bacterial resistance to phages; control of gene dissemination; strain genotyping (based on spacer hypervariability), and the study of microbial population dynamics [35]. In addition, there are already several studies using this machinery as an effective form of genetic engineering, not only in microorganisms but also in the case of plants of economic interest [36].

Three different types of CRISPR-Cas system (Type I, Type II, and Type III) have been identified [36] and were grouped according to the conservation of cas genes and operon organization [37], see Fig. 1(b). In the case of CRISPR Type I and Type III systems, they use sets of Cas proteins. The Type I system uses a multiprotein CRISPR RNA (crRNA) complex known as Cascade, this complex recognizes the target DNA, which is then cleaved by Cas3. For the Type III system, Cas10 is assembled into a Cascade-like complex that recognizes and cleaves the target [36, 38]. Regarding the CRISPR-Cas Type II system, it can be defined by the presence of the endonuclease Cas9, guided by the crRNA. In this system, the cas9 gene is the only one needed to fight the invading DNA. During an infection, the complex formed by Cas9 destroys the viral genome, also relying on the protospacer adjacent motif (PAM) domain. The Cas9 enzyme contains two domains with nuclease activities (RuvC and HNH), which participate in immunity [31, 39].

CRISPR/CAS 9

According to [35], the CRISPR-Cas system occurs in nature and can be defined as a prokaryotic immunity system based on the capture and insertion of small DNA fragments coming from the invasion by viruses or plasmids. These fragments are later incorporated into the bacterium’s genome and against which it then acquires resistance. Taking advantage of this natural biological process, the researchers managed to develop the technique called CRISPR-Cas9, this technique uses a guide RNA (thus referring to CRISPR) and only one of the Cas proteins (the endonuclease Cas9). In order to simplify the names, many investigators refer to the technique simply as CRISPR. However, it is important to note that the endonuclease used is not always Cas9, other endonucleases such as Cpf1 have been used recently (CRISPR-Cpf1) [40].

The endonuclease most frequently used in eukaryotic cell DNA editing is the enzyme (SpCas9), and it comes from the bacterial species Streptococcus pyogenes. The three-dimensional analysis of this enzyme revealed that it has a bilobed structure: a recognition lobe (REC) and a lobe with nuclease activity (NUC). The NUC lobe has the catalytic domains RuvC and HNH, in addition to the PAM interaction domain (PI) [41].

In order for the Cas9 enzyme to perform its function, it needs to be activated and targeted to its target. In the process of natural immunity in bacteria, these steps are aided by two cooperatively acting RNA molecules: crRNA and tracrRNA (trans-activating RNA) as in Fig. 2(a). However, in order to make this process as simple as possible for laboratory application, the researchers developed the single guide RNA (sgRNA or gRNA): a chimeric molecule resulting from the “fusion” of crRNA and tracrRNA, synthesized to accumulate the two functions, which are highly dependent on their structures [41]. Basically, the CRISPR technique involves three molecules: a nuclease (usually the wild-type Cas9 from S. pyogenes), a guide RNA (known as single guide RNA), and the target (often DNA), thus facilitating the experimental procedure.

The sgRNA is formed by a hairpin fold consisting of the target recognition sequence also known as the guide sequence (∼20 nt at the 5’ end, specific for each target) plus a universal sequence (∼80 nt, the scaffold, end 3´) and conserves the base-pairing interactions in the double-strand [41] See Fig. 2(b). At first, sgRNA was tested in prokaryotes [41, 42], but more recently it has been used extensively in editing mammalian cell genomes [42].

The region in the DNA that will be cleaved by the Cas9 nuclease has two elements: (I) the target itself and (II) a PAM sequence [43, 44]. The Cas9/sgRNA complex will interact with the target only if there is an adjacent PAM motif on the other strand of DNA. PAMs are short sequences, usually 2–5 nt (e.g., 5’NGG3’ and 5’NNGRRT3’), essential for anchoring the nuclease to the cleavage site.

Fig. 2
figure 2

(a) The process of natural immunity in bacteria, these steps are aided by two cooperatively acting RNA molecules: crRNA (CRISPR-derived RNA) and tracrRNA (trans-activating RNA). (b) (sgRNA or gRNA): a chimeric molecule resulting from the “fusion” of crRNA and tracrRNA, synthesized to accumulate the two functions [20]

Cas9/sgRNA mechanism of action

Double cut cleavage

Crystallographic investigations of the SpCas9 enzyme have highlighted some of the main steps in the cleavage process.

Initially, the interaction of Cas9 with sgRNA results in an alteration in the protein’s structure. This change facilitates the later interaction between the PAM Interacting (PI) domain and the PAM sequence. In this configuration, PI acts simultaneously on the two strands of the target DNA, separating them.

This process occurs as follows: conserved arginine residues in PI bind to the dinucleotide “GG” of the PAM motif (5’NGG3’), spatially limiting the movement of the non-target chain. Simultaneously, another region of PI interacts with a phosphodiester group in the target chain, “pulling it away” from the non-target chain. This action results in the local separation of the target DNA strands (strand opening) immediately upstream of the PAM sequence [45]. This local opening allows the start of the interaction of the guide sequence (from the sgRNA) with the target itself, which occurs just upstream of PAM [45, 46]. The initial interaction step involves a fundamental domain of the guide sequence called seed, ~ 8 nt in length [47]. From the seed, the pairing between sgRNA and the target is initiated, however, bad pairings between seed and a possible target are not allowed, thus preventing the stabilization of Cas9/sgRNA with this molecule. Otherwise, with perfect pairing between seed and target, the opening process (separation of the two strands of DNA) proceeds, resulting in a stable interaction between Cas9/sgRNA and the target. Ultimately, the negative charge of the sgRNA/DNA-target heteroduplex is accommodated in the groove formed at the interface between the REC and NUC lobes of Cas9, which have a positive charge [45]. From that moment on, Cas9 is able to cleave the target in the two strands of DNA, through its catalytic domains HNH (which cleaves the target strand, complementary to the guide) and RuvC (which cleaves the other strand). This cleavage occurs within the target, about 3–4 bp upstream of PAM, generating blunt ends [41].

Single cut cleavage

In addition to the double-cut cleavage, it is still important to mention that a variant of the wild type (i.e., unmodified) Cas9 enzyme, named Cas9 nickase (Cas9n), has recently been developed [10]. This variant allows cutting only one of the two strands of the target DNA and this occurs due to the inactivation of one of its catalytic domains (RuvC or HNH) [41, 38, 49].

spCas9, saCas9, and Cpf1 nucleases

To overcome the limitations of spCas9, such as its large size and G-rich PAM sequence, other CRISPR enzymes have been alternatively proposed, the most mentioned being Cpf1 and saCas9.

Despite its versatility, the size of the S. pyogenes Cas9 (SpCas9) limits its usefulness for basic research and therapeutic applications using the highly versatile adeno-associated virus (AAV) delivery vehicle. Ran et al., in 2015 [50] characterized six smaller Cas9 orthologs and they showed that Staphylococcus aureus Cas9 (SaCas9) can edit the genome with similar efficiencies to SpCas9, although being more than 1 kilobase shorter. In 2015, Ran and his collaborators, can packaged SaCas9 and its unique guide RNA expression cassette into a single AAV vector and targeted the cholesterol-regulating gene Pcsk9 in mouse liver. One week after injection, they observed > 40% gene modification, accompanied by significant reductions in serum Pcsk9 and total cholesterol levels [50].

A new CRISPR nuclease, Cpf1, was discovered in late 2015 by Feng Zhang’s group at MIT [51].

Cpf1 allows for new targeting possibilities as it recognizes T-rich PAM sites. While Cas9 may be the preferred nuclease for targeting G-rich areas, Cpf1 can be used to target T-rich areas. Cpf1 requires a shorter guide RNA to operate. While Cas9 requires the presence of a tracrRNA to process crRNA, Cpf1 can process pre-crRNA by itself [25]. This is of particular interest for biotechnology, as compared to ~ 100 nt tracrRNA/crRNA hybrids used with Cas9, Cpf1 can be targeted with only a ~ 42 nt crRNA. This reduces the size of the designed sgRNA by more than half, while simplifying the methods and costs associated with synthesis and (if desired) chemical modification [52].

Another notable difference is that Cas9 generates blunt ends after cleavage, while Cpf1 leaves 5’ sticky bumps that can be used for directional cloning.

Cpf1 cleaves DNA 18–23 bp downstream of the PAM site, resulting in no disruption to the recognition sequence after NHEJ repair of double-stranded DNA break. As a result, Cpf1 allows for multiple rounds of DNA cleavage and a greater opportunity for the desired genomic editing to occur [53].

DNA repair: NHEJ and HDR

The cleavages performed by Cas9, generating sharp ends (cut in both strands) or nicks (cut in only one of the strands), must be repaired by the cell. In this scenario, two mechanisms can come into play: (a) NHEJ (non-homologous end joining) or (b) HDR (homology-directed repair) [20] as represented in Fig. 3.

Fig. 3
figure 3

The two mechanisms of DNA repair: (a) NHEJ (non-homologous end joining) and (b) HDR (homology-directed repair) [20]

NHEJ

In general, and simplifying the process, this mechanism has the function of bringing together the two ends of DNA, through events that do not depend on homology between the two ends. However, this pathway is intrinsically error-prone, generating Indel-type mutations (insertion or deletion of one or a few nucleotides) at or near the tip junction site [42]. These mutations, in turn, will compromise the functionality of the target gene’s end product. That is, the joint action of Cas9-NHEJ often results in gene knockout. Gene knockout is of great interest for genetics, cell biology, biomedicine, and several other areas, with several applications.

HDR

When the ends generated by the cleavage have homology to each other (or to some third molecule), there is another repair pathway that can be recruited – HDR, based on homologous recombination. Sometimes the cleavage of target DNA by the Cas9 enzyme does not generate ends with some degree of homology. However, even in these situations the HDR repair process can be triggered. For this, it is necessary to introduce a donor DNA molecule, which must have homology with both ends resulting from the cut.

The use of HDR repair is essential when the goal is to generate “knock-in” cells/organisms or to perform an allelic replacement. Knock-in generally refers to the introduction of a new DNA sequence into the genome, such as a transgene [54]. Allelic substitution, on the other hand, refers to the exchange of a nucleotide sequence for another that is very similar, but distinct (i.e., of one allele for another) [55]. In both cases, the CRISPR experimental procedure is altered, that is, three exogenous elements are introduced into the cell: Cas9 nuclease, sgRNA, and donor DNA (the trigger of HDR).

Cas Nucleases future trends

Wang JY et al. [56] found that in some CRISPR systems, a fusion of reverse transcriptase (RT) with Cas1 integrase and Cas6 maturase creates a single protein that allows for sequence-matched integration and crRNA production. To elucidate how RT-integrase organizes distinct enzymatic activities, they present the cryo-EM structure of a Cas6-RT-Cas1-Cas2 CRISPR integrase complex. The structure reveals a heterohexamer in which the RT directly contacts the integrase and maturase domains, suggesting functional coordination between all three active sites. These findings highlight an expanded ability of some CRISPR systems to acquire diverse sequences that drive CRISPR-mediated interference [56].

Liu et al., [57] revealed the underlying mechanisms of a distinct third RNA-guided genome editing platform called CRISPR – CasX, which uses unique structures for programmable double-stranded DNA ligation and cleavage. With biochemical and in vivo data, they demonstrated that CasX was active for Escherichia coli and human genome modification. They demonstrated how CasX activity arose through convergent evolution to establish a family of enzymes that is functionally separate from Cas9 and Cas12a [57].

Pausch et al. [58], described a minimally functional CRISPR-Cas system comprising a single ~ 70 kilodalton protein, CasΦ, and a CRISPR matrix, encoded exclusively in the genomes of huge bacteriophages. CasΦ uses a single active site for CRISPR RNA processing and crRNA-guided DNA cutting to target foreign nucleic acids. This hypercompact system, according to these authors, is active in vitro and in human and plant cells with expanded target recognition capabilities compared to other CRISPR-Cas proteins. Useful for genome editing and DNA detection, but with a molecular weight half that of Cas9 and Cas12a genome editing enzymes, CasΦ offers advantages for cell delivery that expand the genome editing toolbox [58].

The CRISPR-Cas Type VI system contains the programmable single effector guided RNA with RNases Cas13 [59]. CRISPR-Cas13 can be used as a flexible platform to study RNA in mammalian cells and for therapeutic development [60].

The diagnostic tool, SHERLOCK (Specific Enzymatic Reporter UnLOCKing) developed by Biosciences, uses synthetic RNA strands to create a signal after cutting and Cas13 will cut this RNA after cutting its original target, releasing the signaling molecule.

In 2020, the Food and Drug Administration granted authorization for a CRISPR-based test that can diagnose COVID-19 in about an hour. The test uses the Cas13a enzyme to identify an RNA sequence unique to SARS-CoV-2.

Conclusions

As early as 1985 Oliver Smithies and Mario Capecchi carried out work involving the integration of a donor DNA via homologous recombination [61], that is, the beginning of gene editing; however, only after three decades did the terms gene editing or editing of genes gain attention from the general population. Much of this attention was mainly due to the more widespread use of nucleases that allow target-site cleavage (e.g., ZFNs [62], TALENs [63], and CRISPR/Cas9), allowing the generation of DBS and activating the repair systems, thus dramatically increasing the efficiency of the process. The great success of CRISPR is mainly due to the simplicity and ease of gRNA design (only about 20 nucleotides are needed that will compose the guide RNA and will be complementary to the target region) [10]. In addition, it has a greater mutation efficiency and allows the manipulation of more than one gene at the same time, thus being able to generate multiple mutations [15, 16, 17], without the problem of off-target effects seen in TALENs and ZFNs [18, 19].