Transcript and gene numbers from single cell RNA-seq and cell hashing are similar
To study the effect of multiplexing samples on the transcriptome using the cell hashing technique, we used PBMCs from two patients with AML4 and labeled them with antibodies specific to a protein expressed on the surface of all cells coupled to a hashtag oligonucleotide (HTO), before mixing them and sequencing them together.
Using 10X Genomics scRNA-seq technology, we sequenced 4,390 cells from the UPN23 Single Cell or UPN23 SC sample and 16,645 cells from the UPN29 SC sample sequenced individually, and we also sequenced with the cell hash technique 5,601 cells and 6,480 cells in the UPN23 SC_m (for single cell_multiplexed) and UPN29 SC_m samples, respectively. The UPN23 SC sample contained approximately 1,660 median genes per cell (median 1603; 1st-3rd quartile, 2248-2972), and approximately 978 median genes per cell for the UPN23 SC_m sample (median 949; 1st-3rd quartile, 1510-2007). Regarding the UPN29 sample, we detected approximately 2,191 genes per cell in UPN29 SC (median 1373; 1st-3rd quartile, 2209-2759) versus 1,763 genes per cell in UPN29 SC_m (median 1356; 1st-3rd quartile, 1772-2153) (Fig. 1a, Table 2). Regarding the number of captured UMIs, attached to each sequence read identified unique transcripts, to avoid inflated counts due to PCR amplification during library preparation, was relatively higher in the samples sequenced alone (median of 4,015 for UPN23 SC and 7,765 for UPN29 SC) than in the samples multiplexed (median of 3978 in UPN23 SC_m and UPN29 SC_m) (Fig. 1b). Subsequently, we compared the percentages of mitochondrial genes in each patient, and we found consistency between the two approaches in both patients (Fig. 1c). Through this quality control analysis, we found consistency in the analysis of sequencing results with both approaches.
Table 2 : Sequencing statistics
Sample
|
Estimated number of cells
|
Total read number
|
Mean reads per cell
|
Median genes per cell
|
UPN23 SC
|
4 390
|
171 943 491
|
39 167
|
1 660
|
UPN23 SC_m
|
5 601
|
207 650 556
|
37 100
|
978
|
UPN29 SC
|
16 645
|
711 710 869
|
42 758
|
2 191
|
UPN29 SC_m
|
6 480
|
341 065 053
|
52 633
|
1 763
|
UPN: Unique Patient Number; SC: Single Cell RNA-seq; SC_m: single cell_multiplexed
After filtering the cells from each sample, we obtained about 2,579 cells in UPN23 SC versus 3,176 cells in UPN23 SC_m, we also found about 14,722 cells in UPN29 SC and 6,347 cells for UPN29 SC_m (Fig. 1d and 1e). Next, we grouped the cells in each sample into different clusters using Seurat software. We found 12 different clusters in UPN23 SC (clusters 0-11) versus 11 different clusters in UPN23 SC_m (clusters 0-10; Fig. 1d). Concerning the UPN29 sample, we found 16 different clusters in UPN29 SC (clusters 0-15) and 13 different clusters in UPN29 SC_m (clusters 0-12; Fig. 1e).
Gene expression levels correlate between scRNA-seq and cell hashing
To complete this preliminary analysis, we examined the correlation between the mean expression of all genes in each sample sequenced alone and that sequenced with the cell hashing technique. Figure 1f - g shows that the expression levels of the genes from the UPN23 and UPN29 samples sequenced with the two approaches were highly correlated (r 0.996 for UPN23 and r 0.998 for UPN29).
Overall, our data suggest that the cell hashing approach faithfully preserves the transcriptomes of individual cells for the study of gene expression.
After the quality control of the two samples sequenced with two different approaches, we were interested in the analysis of the genes used in the diagnosis as markers of leukemic blasts (table 1). We first analyzed the leukemic blast markers of UPN23 patients sequenced alone or with the cell hashing technique, namely: CD117, CD34, CD33, CD13, CD36 and CD64 (Fig. 2a) and the results show a correlation in the expression of these different genes in UPN23 SC and UPN23 SC_m (Fig. 2b). Subsequently, we performed the same analysis in UPN29 sequenced alone and with the cell hashing approach, so we analyzed the markers of leukemic blasts; CD117, CD34, CD33 and CD11 (Fig. 2c) and the results show a correlation in the expression of these different genes in UPN29 SC and UPN29 SC_m (Fig. 2d). These results show a correlation in expression of leukemic blasts markers between the two approaches, even though the number of genes and the variability of gene expression is reduced in the cell hashing approach compared to the classical single cell approach.
After the analysis of the leukemic markers, we looked at the genes specific to each cluster in the two samples. To do this, we analyzed the differentially expressed genes in each sample and approach, and the results of this analysis are presented in the form of a heatmap (supplementary figure 1), showing each cluster with the top of the overexpressed genes in each condition.
The analysis of these genes shows no difference between the genes expressed in UPN23 SC (supplementary figure 1a) compared to UPN23 SC_m (supplementary figure 1 a and b). For example, we found the same expression of some cell markers, such as MS4A1, CD79 A and B, CD3D, NKG7, GZMA, GNLY which are markers of normal hematopoietic cells; B lymphocytes, T lymphocytes and natural killer cells (NK), but also some genes involved in the leukemic process, such as MPO and SOX4. We found the same results in UPN29 SC and UPN29 SC_m (supplementary figure 1 c and d), namely the expression of markers of LB, LT and NK cells (CD79A, NKG7, and GNLY), and genes involved in leukemogenesis (MPO and SOX4).
Analysis of the top 10 marker genes in each cluster expressed in UPN23 and UPN29 shows that there are 93 top 10 marker genes in common between UPN23 SC and UPN23 SC_m (Figure 3a) and about 104 top 10 marker genes in common between UPN29 SC and UPN29 SC_m (Figure 3b). This analysis shows good recovery with both approaches; this was confirmed by the results of the Gene Ontology analysis of molecular function performed on the top 10 most expressed genes in each cluster in UPN23 (Figure 3c) and UPN29 (Figure 3d). The results show that the genes are classified into 6 different categories including: transporter activity, structural molecular activity, molecular transducer activity, molecular function, catalytic activity, and binding. These different results show a concordance between the two approaches, this could confirm good results with the cell hashing technique comparable to the single cell RNA-seq technique.
In conclusion, this comparative study of the two single cell RNA sequencing approaches shows that the cell hashing technique works relatively well and induces little loss of genetic information or transcriptomic modification. Indeed, we found a correlation in the expression of genes and cellular markers compared to the single cell classic approach. These results also allow us to use this approach to sequence many samples in the same run, thanks to the labeling of the cells with HTOs, which facilitates the comparison between different samples and conditions without worrying about batch effects. However, it is necessary to be careful in the way of using this approach, if for example you want to highlight rare cell populations, it is necessary to reduce the number of samples to be analyzed, because each time you increase the number of samples, you decrease the number of cells to be sequenced for each sample and thus you lose the possibility of detecting rare cell populations. In fact, considering the analysis capacity of each experiment -about 30,000 cells- we consider that in most cases the use of 8 simultaneous samples is optimal. This allows the analysis of relatively rare populations in each sample, since such a population represented at 1% would still include about 40 cells with this approach. For smaller populations, it would be necessary to return to the traditional single cell analysis. The multiplexed single cell transcriptomic study has been shown to determine the sensitivity of leukemic cells to chemotherapy while clarifying the mechanism of action [29]. Use in clinical "routine" could be of interest in AML. One could imagine testing rapidly in one single cell experiment different chemotherapies. With a result in less than one week (delay considered reasonable in view of the adaptation of the protocol to targeted therapies, apart from emergency situations such as disseminated intravascular coagulation or leukostasis syndrome), the therapeutic protocol could be adapted according to the single cell analysis. This analysis would also have the merit of anticipating tumor escape by identifying chemoresistant cells (stem cells versus more differentiated populations), their escape mechanisms and the means of bypassing them (resensitization strategies).