Introduction

In contrast to conventional methods of molecular biology where the expression of a single or few genes can be evaluated in a single assay, transcriptomics approach enables gene expression profiling of all human coding-genes in a single experiment (Lowe et al. 2017). Therefore, transcriptomic approach saves time and effort; and most importantly provides an almost comprehensive picture of the whole transcriptome (the sum of all RNA transcripts) of a sample. This has made transcriptomics to be commonly utilized for studying the molecular bases of human diseases, especially tumors (Iacobas 2020; Uhlen et al. 2017). Transcriptomics-based findings have heavily contributed to a better understanding of the molecular pathology of human cancer; and facilitated the discovery of therapeutic targets and biomarkers (Alsagaby 2019a; Rodon et al. 2019; Supplitt et al. 2021; Wang et al. 2017).

There are two main approaches of transcriptomics; DNA-based hybridization, where a large number of oligonucleotides (probes) that correspond to all human coding-genes are attached on a surface of a microscopic slide (Barbulovic-Nad et al. 2006). The location of each probe on the microarray slide is annotated; hence each probe location corresponds to a specific gene. When fluorescent-labeled cDNA converted from total RNA of a sample is added to the microarray slide, the fluorescent-labeled cDNAs hybridize specifically with probes. Next, the intensity of the fluorescent signal at a probe location on the microarray slide is used to indicate the relative quantity of a transcript (Barbulovic-Nad et al. 2006). The other approach of transcriptomics is RNA sequencing (RNASeq), in which the total RNA expressed in a sample is sequenced and the number of sequenced RNA copies from a gene is used as quantitative data of the gene expression (Ozsolak et al. 2011). Each approach has its strengths and weaknesses, but they are both well established and widely used (Tachibana 2015).

Good quality of high-throughput technology-based findings, such as those produced by “omics” approaches, is dependent on high quality samples (Alsagaby 2019b; Horgan et al. 2011; Lowe et al. 2017). Preserved integrity of RNA sample is greatly needed to truly reflect the status of in vivo transcriptome (Husseini et al. 2021). RNA degradation is a physiological process that is tightly regulated to determine the fate of RNA (Houseley et al. 2009). Sample storage and in vitro handling negatively impact RNA integrity (Hatzis et al. 2011; Koh et al. 2021a). Therefore, the assessment of RNA integrity is an essential quality control check prior to performing a transcriptomics analysis. The RNA integrity number (RIN) that is calculated on the basis of an entire electrophoretic trace of RNA using Agilent’s Bioanalyzer provides a simple and reproducible method for determining RNA quality (Mueller et al. 2004; Schroeder et al. 2006). As a result, RIN is widely accepted as quality control checker of RNA quality in transcriptomics studies (Dong et al. 2013). Although there has been no consensus on a RIN value to be used as determinant for sample inclusion, RIN value equal or above 6 appeared to be most commonly reported in transcriptomics studies (Alsagaby et al. 2020; Hoffman 2004; Letzkus et al. 2014; Vartanian et al. 2009).

In clinical settings, where sample storage or RNA isolation and processing cannot be made immediately, RNA decoying is unavoidable (Botling et al. 2009; Chung et al. 2008b; Hentze et al. 2019; Koh et al. 2021b; Ohashi et al. 2004). Moreover, in cancer patients, chemotherapy disrupts RNA of malignant cells, reducing their RIN value (Narendrula et al. 2016). In fact, low RIN value of cancer cells was proposed as a biomarker of good response to chemotherapy (Parissenti et al. 2015). Therefore, in these situations investigators are left with no other option but to use samples with compromised RNA integrity. In the present study, the goal was to investigate the impact of RNA degradation on the complete transcriptome of leukemia cells.

Materials and methods

Cell culture

The methodology workflow of the current study is represented in Fig. 1. To make the present study relevant to different types of leukemia, four human leukemia cell lines (representing different type of leukemia) were obtained from the American Type Culture Collection (ATCC) and used in this study; THP-1 (cell line of acute monocytic leukemia), HL60 (cell line of acute promyelocytic leukemia), K562 (cell line of chronic myelogenous leukemia) and NALM-1 (cell line of acute lymphoblastic leukemia). RPMI 1640 medium supplemented with 10% fetal calf serum (FCS), 100 U/mL penicillin, 40 μg/mL gentamycin, 100 μg/mL streptomycin sulphate, 4.5 mg/mL glucose and 2 mg/mL sodium bicarbonate was used to cultivate the cancer cells at 37 °C under an atmosphere of humidified air containing 5% CO2.

Fig. 1
figure 1

Workflow of the major steps of the study’s methodology. Four leukemia cell lines were cultured. Next, RNA was extracted from the leukemia cell lines and 300 ng RNA from each leukemia cell line was transferred into one tube to make a pooled RNA sample (1200 ng). Then, 100 ng from the pooled RNA sample was transferred into nine tubes (100 ng each). Four tubes were kept in ice for 30 min and the other five tubes were incubated at 60 °C for 30 min. Post incubation, RIN was measured in the nine tubes. Next, transcriptomics analyses were conducted on the high RIN samples (n = 4) and the low RIN samples (n = 5). Finally, bioinformatics and data analysis were conducted on the transcriptomics profiles. RIN RNA integrity number

Extraction of RNA and measurement of RNA integrity number

RNeasy Mini Kits from QIAGEN were used to extract total RNA from the leukemia cells. Next, NanoDrop ND-1000 spectro-photometer (Thermo Fisher Scientific, Inc.) was employed to evaluate the purity and measure the quantity of the isolated RNA. Samples that were admitted for further analysis were those that possessed a good purity of the extracted RNA (260 nm/280 nm > 2).

Induction of RNA degradation and evaluation of RNA integrity number

Equal amount (300 ng) of the extracted RNA from each one of the leukemia cell lines (THP-1, HL60, K562 and NALM-1) was transferred into a single tube and mixed to create a pool sample (RNA = 1200 ng). Next, 100 ng of the pooled RNA was transferred into nine tubes (each tube contained 100 ng). Of these tubes, four were kept in ice for 30 min, whereas the other five were incubated in water bath at 60 °C for 30 min. The RNA integrity number (RIN) in each of the nine tubes was then assessed using the materials and instructions of the Agilent Nano 6000 chip kit (capillary electrophoresis assay) and the Agilent 2100 Bioanalyzer. Post running the RNA samples in the Agilent Nano 6000 chips, the Agilent 2100 Bioanalyzer produces a gel image with an electropherogram of the analyzed RNA and calculate the RIN.

Transcriptomics profiling

The Agilent procedure of “one-color microarray-based gene expression analysis” was used to conduct transcriptomic profiling of the leukemia cells (nine samples: four samples with RIN ≥ 7.9 and five samples with RIN ≤ 3.8). Briefly, the low input quick amp labeling (one-color) kit (Agilent) was applied to 80 ng of the extracted RNA and the RNA spikes from RNA Spike-In Kit, One-Color (Agilent) in order to convert them to cDNA. Next, using the same kit, the cDNA was transcribed and amplified to fluorescent-labeled cRNA. The materials and protocol of RNeasy Mini Kits (QIAGEN) was used to purify the labeled cRNA. The quality and quantity of the labeled cRNA was assessed using NanoDrop ND-1000 spectro-photometer (Thermo Fisher Scientific, Inc.). Samples that were further processed were those that exhibited the following characteristics: with 260 nm/280 nm > 2, cRNA yield ≥ 825 ng and specific activity (pmol Cy3/μg cRNA) ≥ 6. To prepare the hybridization samples, 600 ng of the labeled cRNA with 5 μL of 10 × Gene Expression Blocking Agent (Agilent) and 1 μL of 25 × Fragmentation Buffer (Agilent) were mixed (sample volume = 25 μL). Incubation for 30 min at 60 °C was conducted to induce fragmentation of the labeled cRNA in hybridization samples. Post cooling of the hybridization samples, the cleaving if the cRNA was stopped by adding 25 μL of 2 × Hi-RPM hybridization Buffer (Agilent) (sample volume = 50 μL). Next, 40 μL of each hybridization sample was transferred onto a gasket that was assembled to SurePrint G3 Human Gene Expression 8 × 60 K Microarray slide (Agilent) and incubated at 65 °C for 17 h. Post incubation, the Gene Expression Washing Buffer 1 and 2 with 0.005% Triton X-102 (Agilent) were used to the microarray slides. The scanning of the microarray slide was performed using Agilent SureScan Microarray Scanner with the AgilentG3_GX_1color protocol. Agilent Feature Extraction Software (version 11.0.1.1; Agilent Technologies, Inc.) was employed to extract the quantitative data of probe features from the scanned microarray slide. The quantitative data were saved for further analysis.

Data analysis and filter criteria

The quantitative data extracted earlier from the microarray slides (see previous section) were loaded into the GeneSpring GX version 12.1 software package (Agilent Technologies, Inc.). Next, the raw signal data of the probes were log2 transformed and normalized using the percentile shift algorithm with percentile target = 75. The analysis was restricted to probes with a flag call of either “detected” or “not detected”. The study included two of conditions: RIN ≥ 7.90 (four samples) and RIN ≤ 3.80 (five samples). The flag “detected” in all samples in one of the two conditions was used to filter the probes included in the analysis. Differentially expressed genes (DEGs) in the two conditions were identified on according to fold change (FC) that set at twofold or greater with corrected p value ≤ 0.05. Cluster analyses and heatmap presentations of DEGs were conducted using hierarchical algorithm and were based on probes (normalized signal value of probes) and conditions with Euclidean method for similarity measure and Wards for linkage rule. gProfiler (Raudvere et al. 2019) was employed for the gene ontology (GO) enrichment analysis; the statistical domain scope was set to be “only annotated genes” and corrected p value cut-off was set at ≤ 0.05.

Statistical analysis

The p values were calculated on the basis of Student’s t-test. For the corrected p values, Benjamini–Hochberg false discovery rate (FDR) was used. GeneSpring GX version 12.1 software package (Agilent Technologies, Inc.) was employed for calculation of p values and corrected p values of the DEGs. The p value of the other analyses was calculated using excel software. Column graphs were also constructed using Excel software.

Results

Evaluation of RNA integrity number

The RNA integrity number (RIN) of RNA samples (n = 4) that were incubated in ice for 30 min and the RNA samples (n = 5) that were incubated at 60 °C for 30 min was measured (Fig. 2). RIN ranged from 7.90 to 8.10 (average = 8) for the samples that were kept in ice for 30 min (these samples termed high RIN) and from 3.40 to 3.80 (average = 3.60) for the samples that were incubated at 60 °C for 30 min (these samples denoted low RIN). As expected, heat induced a significant drop in the RIN (Fig. 2; p = 3 × 10–10). The high RIN samples (n = 4) and low RIN samples (n = 5) were used for the subsequent transcriptomics analysis.

Fig. 2
figure 2

Heat-induced degradation of RNA. The RNA integrity number (RIN) was determined using Nano 6000 chips and the Agilent 2100 Bioanalyzer in two group of samples: RNA samples incubated in ice for 30 min (n = 4) and RNA samples incubated at 60 °C for 30 min (n = 5). Representative examples of electrophoretic trace of RNA samples are shown (A, B). Heat induced a significant decrease in the RIN values (C). The p values were calculated on this basis Student’s t-test using Excel software. Asterisk symbol: **** is p value ≤ 0.0001

Impact of RNA degradation on transcriptome profile

In attempt to investigate the impact of RNA degradation on the global gene expression profiling, DNA-microarray-based transcriptomics analysis was conducted for each one of the high RIN samples (n = 4) and the low RIN samples (n = 5). Then, the transcriptomic profiles of the high RIN samples were compared with that of the low RIN samples. The comparison was limited to genes (probes) that were detected in all samples of either condition; high RIN or low RIN (number of probe IDs = 29,230). Principle component analysis (PCA) showed that compromised RNA integrity induced noticeable changes in the transcriptomes of the samples; the low RIN samples were clustered together and distinctly separated from the high RIN samples (Fig. 3A). Next, fold change (≥ two-fold) with p value ≤ 0.03 and corrected p value ≤ 0.05 were used to identify alteration in the relative quantitation of gene expression in the low RIN samples versus the high RIN samples. A total of 1945 genes (probe IDs), representing 6.7% of the studied transcriptomes, showed significant changes in their expression in the low RIN samples compared with the high RIN samples (Online Resource 1). In contrast, 27,285 genes (probe IDs), which represent 93.3% of the examined transcriptomes, showed stable expression in two groups of samples, as they did not meet the criteria made to defined altered relative gene expression; these genes were termed “stable genes”. Figure 3B shows a volcano plot of the evaluated transcriptomes of the two groups of samples. Of the 1945 genes, 1146 (59%) showed reduced quantitative data in the low RIN samples compared with high RIN samples (denoted down-represented genes). These genes were down-represented probably because they were more degradable than the rest of the transcripts. In contrast, 781 genes (41%) of the 1945 exhibited increased quantitative data in the low RIN samples as opposed to the high RIN samples (termed over-represented genes). These genes were over-represented perhaps because they were more stable (less degradable) compared with rest of the transcripts. Similar to PCA, cluster analysis with heatmap presentation grouped the low RIN samples closely together and high RIN samples closely together; and identified genes that were impacted by the heat-induced RNA decoy (Fig. 3C, FC ≥ two-fold, p value ≤ 0.03, corrected p value ≤ 0.05).

Fig. 3
figure 3

Heat-induced degradation of RNA alters the transcriptome of human cancer cells. Principle component analysis (A) shows that low RIN samples (RIN ≤ 3.8) were grouped together and distinctly separated from the high RIN samples (RIN ≥ 7.9). Volcano plot (B) compares the gene expression in the two groups of samples. Cluster analysis with heatmap presentation (C) also grouped the low RIN samples together and the high RIN samples together; and identifies genes with altered quantification (FC ≥ two-fold, p value ≤ 0.03, corrected p value ≤ 0.05)

Type of RNA enrichment

Next, investigation on the type of RNA among the three groups of genes (the stable genes, over-represented genes and down-represented genes) was conducted to determine whether a type of RNA was enriched in either one of the three groups (Fig. 4). Protein coding RNA appeared to dominate the three groups of genes; the stable genes showed the largest proportion (58%) followed by the down-represented genes (53%) and the over-represented genes (46%). The recorded percentage of other types of RNA was below 20% in the three groups of genes. Interestingly, long non-coding RNA (lncRNA) showed the least proportion in the down-represented genes (3%) versus 10% and 12% for the stable genes and the over-represented genes, respectively. In contrast, retained introns showed bigger percentage (17%) in the down-represented genes compared with the stable genes (11%) and the over-represented genes (11%).

Fig. 4
figure 4

Comparison of the types of RNA in the three groups of genes. Ensemble database was interrogated to identify the type of RNA transcribed from the stable genes, the over-represented genes and the down-represented genes. lncRNA long noncoding RNA

Sequence characteristics

Next, analyses were conducted on the three groups of genes with respect to their sequence features. The analysis reported no significant differences between the three groups of genes in the context of 5’ UTR length (Fig. 5A). In contrast, the over-represented genes (p ≤ 0.01) and the down-represented genes (p ≤ 0.0001) possessed significantly shorter 3’UTR compared with the stable genes (Fig. 5B). Transcript length and gene length appeared to be the longest in the over-represented genes (p ≤ 0.0001) and the shortest in the down-represented genes (p ≤ 0.0001, Fig. 5C, D). The percentage of GC was found to be higher in the stable genes (p ≤ 0.0001) and down-represented genes (p ≤ 0.0001) compared with over-represented genes (Fig. 5E). The distance between a transcript complementary sequence to its corresponding probe and the 5’ end of the same transcript was the longest in the over-represented genes (p ≤ 0.0001) and the shortest in the down-represented genes (p ≤ 0.0001, Fig. 5F).

Fig. 5
figure 5

Sequence characteristics of the three groups of genes. Sequence features were extracted from Ensemble database and NCBI reference RNA sequence database. 5' UTR length of transcripts (A); 3' UTR length of transcripts (B); transcript length (C); gene length (D); percentage of GC content in transcripts (E); distance between probe positions and the 5’ end of transcripts (F). UTR untranslated region. The p values were calculated on this basis Student’s t-test using Excel software. Asterisk symbols: * is p value ≤ 0.05; ** is p value ≤ 0.01; *** is p value ≤ 0.001; **** is p value ≤ 0.0001

Gene ontology enrichment

Gene ontology (GO) analysis using gProfiler was performed on the over-represented and the down-represented genes in order to determine whether significant enrichments of GO terms can be found in the two groups of genes (Tables 1 and 2). Protein binding appeared to be a significantly enriched molecular function term in both groups of genes. Nevertheless, catalytic activity (p = 0007) and enzyme binding (p = 0007) were specifically enriched in the down-represented genes. On the other hand, nucleic acid binding (p = 0.04), DNA binding (p = 0.05) and chromatin banding (p = 0.05) were enriched only in the over-represented genes. While metabolic process was commonly enriched as a biological process term in the two groups of genes, metabolic process of RNA (p = 3.97 × 10–5) and nucleic acid (p = 8.93 × 10–5) appeared to be specifically enriched in the over-represented genes. Furthermore, the over-represented genes uniquely enriched for biological processes, such as regulation of gene expression (p = 3.97 × 10–5), positive regulation of gene expression (p = 0.0002), positive regulation of RNA biosynthetic process (p = 0.0002) and mRNA transport (p = 0.0003). Nucleus, nuclear lumen and nucleoplasm were commonly enriched in both groups of genes. However, cytoplasm (p = 2.69 × 10–19), vesicle (p = 1.47 × 10–18), Golgi-associated vesicle (p = 0.001) and endoplasmic reticulum (p = 0.001) were enriched specifically in the down-represented genes.

Table 1 Gene ontology enrichment analysis of the down-represented genes
Table 2 Gene ontology enrichment analysis of the over-represented genes

Discussion

In global gene expression studies, preserving RNA integrity is a very essential step to insure true reflection of cellular transcriptome (Lightfoot et al. 2005; Lowe et al. 2017). However, saving RNA from decoying in clinical samples is not always possible despite the big effort made in the optimization and standardization of protocols that deal with the extraction, handling and processing of RNA (Chung et al. 2008a; Debey et al. 2006; Farina et al. 2014; Hatzis et al. 2011; Hewitt et al. 2008). As a result, working on clinical samples with compromised RNA integrity is inevitable in some cases. In such cases, knowledge of: (i) to what extent low integrity of RNA influences overall transcriptomic profile; (ii) what are the name of transcripts or what are the characteristics of transcripts whose relative quantification does not show true measurement of gene expression due to RNA degradation, may provide a mean through which transcripts with RNA decoy-dependent altered expression can be excluded from the transcriptomic profile. This may give tolerance for working on low quality RNA samples without compromising the credibility of the results. Following the same concept and attempting to contribute to such knowledge, the present work studied the impact of heat-induced RNA degradation on the transcriptome of human leukemic cells.

The results of the PCA, the cluster analysis and the volcano plot showed RNA degradation-dependent changes in the transcriptomes of the human leukemic cells. However, these alterations appeared to have happened to a small proportion of the studied transcriptomes. In fact, only 6.7% (1945 transcripts; FC ≥ 2, p value ≤ 0.03, corrected p value ≤ 0.05) of the leukemic cells transcriptome was affected by RNA degradation. In contrast, 93.3% (27,285 transcripts) of the transcriptomes showed stable expression. This finding may give rationale that working on precious samples from leukemia patients with low RIN values might be justifiable, as the vast majority of the transcripts studied here appeared to be stable. This idea becomes more sensible when pre-existing knowledge of the transcripts that are likely to be affected by RNA degradation is available for researchers. This way, investigators can rule out these transcripts (whose relative quantification is prone to alteration by RNA degradation) from the rest of the transcriptomic profile.

Normalization of the relative quantitative data of DNA microarray experiments is a main step that has to be performed in order to reduce the impact of technical (handling) variations and make the transcriptomic profiles comparable (Do et al. 2006). The normalization done on this study may explain the over-representation of the 781 transcripts (FC ≥ 2, p value ≤ 0.03, corrected p value ≤ 0.05). These genes are the least transcripts affected by the thermal degradation of RNA compared with the rest of the transcripts. In contrast, the down-representation of the 1146 transcripts (FC ≥ 2, p value ≤ 0.03, corrected p value ≤ 0.05) indicates that these transcripts are the most affected by the thermal decoy of RNA compared with the rest of the transcripts. Taken together, the over and down-represented transcripts are of interest for researchers working on the transcriptome of clinical leukemic samples with low RIN value, as altered expression of these transcripts may highlight the poor integrity of RNA samples more than variation in the biology of the disease between samples.

Transcript length appeared in the present study to be a determinant factor of whether a transcript is sensitive to thermal degradation; long transcripts (over-represented genes) were the least affected by the thermal-induced degradation, whereas short transcripts (down-represented genes) were the most sensitive to heat-induced decoy. Similar findings were reported in colorectal cancer cells, where lost RNA integrity had the least impact on long transcripts and the most effect on short RNA (Opitz et al. 2010). Another stability marker of transcript against thermal-induced degradation was the distance between a probe complementary sequence and 5’ end of a transcript. This distance was found to be the longest in the over-represented transcripts and shortest in the down-represented transcripts. Different mechanisms were described for RNA degradation in eukaryotic cells (Parker et al. 2004; Tatosyan et al. 2020), one of which was shown to take place in the direction from 5’ to 3’ (Cannistraro et al. 1985). The fact that the conversion of RNA to cDNA in Agilent DNA microarray was conducted using a universal primer that base pairs with the poly A tail (3’ end); and the direction of RNA digestion from 5’ to 3’ made the majority of Agilent probes to be complementary to sequences close to the 3’ end of transcripts. This may explain why long transcripts and those with long distance between their 5’ end the sequence complementary to their corresponding probes were the least affected by thermal-induced RNA degradation.

The over and down-represented genes significantly enriched for GO terms. Transcripts involved in catalytic activity and enzymatic binding appeared to be most affected by RNA degradation. On the other hand, transcripts that play roles in nucleic acid binding, regulation of RNA biosynthesis, and mRNA transport were the least sensitive to RNA decoy. The enrichment of GO terms here was related to the sequence features of the down-represented and over-represented genes. For example, in contrast to the down-represented genes that enriched for catalytic activity and enzymatic binding, the over-represented genes that enriched for nucleic acid binding, regulation of RNA biosynthesis, and mRNA transport had longer gene and transcript length and longer distance between the probe position and their 5’end. As expected from the experimental design of this study, where the thermal treatment was applied directly on isolated RNA rather on living cells, GO terms (or pathways) like heat shock response, apoptosis, necrosis, DNA fragmentation, or RNA disruption, which would have been activated if the thermal treatment was applied directly to the cell (Dinh et al. 2001; Furusawa et al. 2011; Ren et al. 2019), were not enriched.

Overall, the present work provided valuable insights into the impact of RNA degradation on the transcriptomes of leukemia cells, and highlighted the genes and the characteristics of genes whose relative quantification is influenced by RNA decoy. Therefore, this study has the potential to help investigators to distinguish between RNA decoy-dependent changes in genes expression from pathology-dependent alteration of gene expression when low quality RNA samples are the only available material to use. However, cautions should be taken when relating the current finings to clinical settings because of two reasons: (i) the present work was conducted on human leukemic cell lines rather than clinical samples from leukemic patients, (ii) the RNA degradation induced by thermal treatment was a simulation (not necessarily identical) to RNA decoy of patient samples in clinical settings. Further evaluation of the relative expression of some of the over-represented and under-represented using quantitative real-time polymerase chain reaction (qRT-PCR) may add validity to the transcriptomics data. However, limitations in the project budget and in the logistic support, which has been caused by the COVID19 pandemic, did not allow performing additional experimental work for the assessment of gene expressing using qRT-PCR. Our planned future work will apply the same concept of the present work on leukemia samples isolated from leukemia patients to identify genes whose transcripts’ relative quantitation is sensitive to RNA degradation in clinical settings.