Introduction

Human body fluids such as blood and saliva are the most common sources of biological trace material found at a crime scene. Reliable tissue identification in forensic casework is important as it provides crucial insights into crime scene reconstruction and can thus contribute towards solving crimes. Blood stains are routinely tested in forensic practise using various methods including the tetrabase (4,4-bis(dimethylamino)diphenylmethane) test [1], the Kastle–Meyer phenolphthalein test, the tetramethylbenzidine test [2], the orthotolidine test [3], or the luminol (3-aminophthalhydrazide) chemoluminescence test [4], with the latter especially appropriate for detecting blood stains after cleaning attempts [2, 5]. All these presumptive—thus indicative but not identifying—tests take advantage of the peroxidase-like activity of the heme unit of the hemoglobin molecule in human blood. Therefore, false-positive results can be caused by the presence of strong oxidants, such as chlorine-containing detergents or by true peroxidases (e.g., from plants) [6].

Saliva stains are usually detected in forensic practise via an enzymatic amylase test using Phadebas [7] or with a recently developed enzyme-linked immunosorbent assay-based method [8]. However, because of amylase degradation, the time window for the successful performance of such tests can be limited [9]. Furthermore, no amylase assay can distinguish between salivary amylase and amylases from other tissues (pancreatic, urinary, etc.); therefore, the tests for saliva identification are only presumptive (similar to existing blood identification tests).

On the other hand, methods for identification and quantification of mRNA are already well established, although mostly outside the forensic field. These methods make massive multiplex gene expression profiling possible—among many other applications—for the discovery of tissue-specific mRNA markers. The major concern of using mRNA markers for forensic applications is their assumed high susceptibility to degradation. However, recent studies using a few selected genes demonstrated that it is possible to isolate total RNA of sufficient quality and quantity from biological stains that are several months or even years old [1012]. It has also been suggested, although with limited evidence so far, that different types of mRNA seem to follow different rates of degradation [13]. It is assumed that the degradation process of mRNA is influenced by many external and internal factors, including structural peculiarities like the presence of AU-rich elements (ARE motifs), protein binding properties, and cellular localization [14, 15]. However, detailed knowledge on the molecular reasons for differences in RNA degradation between different types of RNAs as well as between mRNAs of different genes is currently lacking and further investigations are sorely needed.

Although a small number of mRNA markers has been tested for tissue identification in forensic science [1619], no systematic study has yet been performed. In addition, the identification of candidate markers in previous studies was based on a mixed literature and database search, apparently without strict criteria of selection, considering only a limited number of genes and tissues, and not taking into account RNA degradation levels. Furthermore, expressed sequence tags databases, which were used previously, like the Cancer Genome Anatomy Project [18], are expected to provide heavily biased information on candidate genes because of the nonrandom character of representation of clone libraries.

To find stable mRNA markers for body fluid identification in forensic practice, we performed a systematic and comprehensive whole-genome gene expression analysis on time-wise degraded blood and saliva stains using the Affymetrix U133 plus2 GeneChip. This expression array contains >54,000 mRNA probe sets, which encompass most, if not all, known and predicted human genes. Tissue-specific expression patterns of the most promising candidate genes from the array analyses were further confirmed using the GNF SymAtlas expression database [20], which covers about 100 human tissues, and finally verified by quantitative real-time polymerase chain reaction (PCR) in blood and saliva as well as in other body fluids relevant for forensic casework, i.e., semen and vaginal secretion.

Materials and methods

Sample collection

Aliquots of 5 ml of whole blood and saliva were collected from each of five healthy volunteers (four men and one woman) of western European genetic origin under informed consent before their inclusion in the study. Native blood was collected without anticoagulation treatment to avoid disturbing effects of anticoagulation reagents on gene expression. In each sample, 75 cotton swabs were immersed. Special care was taken to shorten the time between collection and swab absorption to avoid blood coagulation. After complete absorption of the fluids, swabs were left until dry on a bench top at room temperature. When dry, the swabs were stored in dust-free nonhumid conditions (but subjected to normal daylight) for different time intervals. Swabs were visually inspected and sorted out to ensure similar liquid content between individual swabs. After 0, 1, 3, 7, 14, 21, 57, and 180 days, swabs were stored at −80°C until RNA isolation. For the time interval 0 days, samples were frozen immediately after drying. Semen and vaginal secretion samples were collected from one male and one female individual absorbed with cotton swabs and dried overnight before RNA isolation.

RNA isolation

RNA was isolated using the Qiagen RNeasy kit (Qiagen Benelux B.V.) according to the manufacturer’s instructions with minor modifications. These included cutting up the cotton swab into 1 × 1-mm pieces and soaking them in RLT buffer for 1 h at 4°C before the extraction. Trial experiments to lengthen this incubation time up to 24 h did not reveal any improvement in respect to RNA quantity and quality (data not shown).

Microarray hybridization and gene expression data analysis

Before hybridization to Affymetrix U133 plus2.0 GeneChip arrays (Affymetrix, Santa Clara, CA), RNA isolated from blood and saliva stains was amplified using the Ambion MEGAscript T7 two-cycle amplification kit (Applied Biosystems, The Netherlands). Amplification, labeling, hybridization, washing, and scanning were performed by the microarray core facility of the Erasmus MC Center for Biomics according to Affymetrix specifications. Background subtraction and probe signal summarization were calculated according to the robust multiarray analysis algorithm [21] using the R Bioconductor software [22]; the resulting log2 signal values were back-transformed to linear scale. Presence/absence calls for individual probe sets were calculated with the mas5calls function of the Bioconductor mas package. Because the constant global mean assumption does not hold true for arrays hybridized to differentially degraded RNA samples, the normalization of the signal intensities between samples was performed using the nonhuman control genes present on Affymetrix arrays (spiked-in probes). Normalization factors for each array were inferred from the average signal intensities of bioB, bioC, bioD, and Cre control probe sets. Analysis of differential gene expression was performed using the significance analysis of microarrays (SAM) algorithm [23] implemented in the TM4 software [24]. In the saliva dataset, we selected only genes with signal intensities above 50 (which is below the usually applied background threshold in expression array experiments) that had a signal intensity below 50 in the blood dataset. The selection of blood-targeted genes was done in a similar manner but with different criteria, the lower intensity limit in blood was set to 1,000 to reasonably restrict the number of candidates.

Real-time PCR

First strand cDNAs were synthesized with SuperScript® III RTS First-Strand cDNA Synthesis Kit (Invitrogen BV, The Netherlands) using total RNA as a template. The primers were designed with Primer3 software [25] so that forward and reverse primers were complementary to different exons of the respective genes and most closely located to the 3′-end of the corresponding RefSeq cDNA (Electronic Supplementary Material Table S1). Real-time PCR reactions with the SuperScript® III Platinum® SYBR® Green One-Step qPCR Kit (Invitrogen BV) were performed on an ABI 7300 PCR machine (Applied Biosystems, The Netherlands) using the following parameters: initial denaturation at 94°C for 10 min, followed by 45 cycles of denaturation at 94°C for 15 s, and a final annealing/elongation at 60°C for 30 s. Melting profiling and agarose gel electrophoresis were used to confirm the specificity of the primers and the absence of DNA contamination. Quantification of the amplified cDNA yield in comparative blood and saliva PCRs was done by the standard curve method. PCR experiments with semen and vaginal secretion were quantified using delta Ct (dCt) method. In both cases, GAPDH gene was used as an endogenous control to normalize the amplification signal between the samples from different tissues and individuals. Time points were compared to each other without normalization: Assuming the temporal degradation of all RNA molecules, no internal control gene could be used, and the only proper way to normalize RT-PCR signals was to use the same amount of template in each reaction. We found that this requirement holds true for our experiments because the GAPDH expression variability between different samples from the same tissue was relatively low (CV <25%, data not shown), which is probably because of approximately the same amount of blood or saliva absorbed with cotton swabs during material collection.

Results and discussion

Microarray expression data

As expected, hybridization signals demonstrated high variability between individuals; however, the most striking differences were observed between the different tissues. Signal intensities in blood samples were on average about five times higher than in saliva (174.2 ± 1.9 in blood samples vs 26.9 ± 0.7 in saliva; Wilcoxon test rank sum p < 0.001). In addition, at the time-point zero, the number of the probe sets called as present according to the Affymetrix algorithm was, on average, more than three times higher in blood than in saliva (30.2% ± 0.9 vs 9.3% ± 0.6; t test p < 0.001). The SAM test with stringent parameters (false discovery rate was set to 0%) showed that, both in blood and saliva experiments, no genes demonstrated significant expression differences in a time range of 0–57 days of stain storage. Only few genes (37 and 10 significantly differential genes for saliva and blood, respectively) appeared to be differentially expressed at 180 days in comparison to other time points. This suggests that in dried blood and saliva, mRNA molecules remain relatively stable for a long period. Recent studies of Heinrich et al. [26] also revealed poor correlation between RNA degradation and postmortem time intervals.

Selection of tissue-specific markers

The initial selection of tissue-specific genes was performed using the normalized signal intensities of microarray hybridizations averaged across the five biological replicates at the zero experimental time point. About 500 apparent saliva-specific and 1,000 apparent blood-specific candidate genes were selected. Further refinement of tissue-specific gene sets was achieved by probing the selected candidates against the GNF SymAtlas tissue database [20] after excluding all cell lines from the database retaining only tissues and organs for the analysis. Genes were selected only if they were highly and exclusively expressed in the target tissue(s) based on the GNF SymAtlas database. For blood, target tissue in the database was defined as whole blood; while for saliva, the target tissues were salivary gland, tongue, trachea, and tonsils. The selection criteria were as follows: high expression (signal intensity >1,000) in target tissue and low expression (signal intensity <200) in nontarget tissues. Using these criteria and combining data from expression array experiments as well as GNF SymAtlas database verification, we identified six saliva-targeted genes and 15 blood-targeted genes that were highly expressed only in target tissues (or respective organs) but not, or nearly not, in the nontarget tissues (Electronic Supplementary Material Figure S1a, S1b, S1c).

RT-PCR confirmation of tissue-specific markers

To confirm the microarray results, real-time PCR experiments were designed for the 21 best candidate markers selected from array hybridizations and database searches and performed using RNA extracted from aged blood and saliva stains, also providing a method suitable for forensic applications. In agreement with the array results, all 21 markers analyzed showed good amplification in the target tissue but no or only marginally detectable amplification in the nontarget tissue (Fig. 1a–c). Among the candidate markers, only the PPL gene that was targeted for saliva demonstrated significant expression overlap with blood and therefore was excluded from further experiments. Our results demonstrate that, irrespective of the stain storage time, sufficient RT-PCR amplification was observed in all samples, even in the samples from the longest storage time tested (180 days), indicating marker stability over long periods of sample storage time. The only exception was the CCR2 gene for which no amplification was detected in the blood stains stored for 180 days and was therefore excluded from further analyses. A plausible explanation for this peculiarity could be the location of the PCR-amplified region, which is more than 1 kb distant from the 3′ end of the mRNA because of the very long untranslated region of CCR2. Apparently, the degree of degradation of the CCR2 mRNA after 180 days of sample storage was too high to allow its efficient reverse transcription using the oligo(dT) method that targets the 3′end of the molecule. This observation highlights the necessity to design PCR primers for the most 3′-proximal part of the mRNA molecule for successful amplification of cDNA fragments in degraded samples.

Fig. 1
figure 1

a, b RT-PCR results for blood-targeted genes in blood and saliva stains. c RT-PCR results for saliva-targeted genes in saliva and blood stains. Genes were selected based on expression microarray results and GNF SymAtlas database. Expression values for each time point were averaged across three male and three female RNA samples; no gender-specific expression differences were detected (t test p < 0.05). B indicates blood; S indicates saliva; samples were processed after complete drying of blood and saliva at 0, 21, 57, and 180 days, respectively

Expression of the candidate markers in other body fluids

For additional confirmation of the tissue-specificity, we tested by RT-PCR the expression patterns of our candidate RNA markers in other body fluids that might be observed in a forensic case, i.e., vaginal secretion and semen. According to the GNF SymAtlas database, all our markers targeted for blood and saliva are not expressed in testis nor in uterus tissues. In agreement, our dedicated RT-PCR experiments revealed that two of the saliva-targeted mRNA markers (SPRR3 and SPRR1A) show no detectable expression in semen (after 50 RT-PCR cycles), and the remaining three (KRT4, KRT6A, and KRT13) show vast overexpression in saliva compared to semen (ddCt > 15, Fig. 2a), keeping with the assumption of high saliva specificity of the five proposed mRNA markers. SPRR1A and SPRR3 genes both encode cornified envelope precursor proteins and are predominantly expressed in oral and esophageal epithelia, where they are strictly linked to keratinocyte terminal differentiation [27]. Keratins 4, 6A, and 13 are known as one of the major structural proteins of oral mucosa [28, 29].

Fig. 2
figure 2

a RT-PCR for saliva and blood-targeted genes in semen stains. b RT-PCR for saliva and blood targeted in vaginal secretion stains. Delta Ct (dCt) values were calculated as follows: dCt = Ct (candidate gene) − Ct (endogenous control, GAPDH gene). Low dCt values correspond to high expression level of the specific mRNA. Gray bars correspond to the samples from target tissues for selected genes (either blood or saliva); black bars correspond to samples from nontarget tissues (either vaginal secretion or semen). Dotted bars represent the cases were amplification was not detected after 50 cycles, in this case, the expression values were arbitrary set to Ct value of 25 (plot maximum)

For the 14 blood-targeted genes, we observed no detectable amplification in semen for nine genes (CASP1, AMICA1, C1QR1, ALOX5AP, AQP9, C5R1, NCF2, MNDA, ARHGAP26), keeping with the assumption of high blood specificity of the respective mRNA markers. These genes encode the proteins with important functions in different types of blood cells. They are known to be highly or even specifically expressed in peripheral leukocytes (AQP9, NCF2, CASP1, C5R1, C1QR1, ALOX5AP [3035]) and myelocytes or hematopoietic cells (MNDA, ARHGAP26, AMICA1 [3638]). However, five genes demonstrated only slightly differential or even comparable expression in blood and semen (CD36, CCR1, PF4, BIN2, and ALOX5), not expected given the information provided by the GNF database, and were therefore excluded from the final list of blood-specific markers. Thus, our microarray-based genome-wide approach to find tissue-specific mRNA markers identified the genes that are functionally relevant for the target tissues.

Furthermore, and not surprisingly, RT-PCR of all saliva- and blood-targeted markers in samples from vaginal secretion revealed gene expression at a level comparable to that in blood and saliva samples (Fig. 2b). The natural occurrence of blood cells in vaginal secretion most likely explains the expression of our blood-targeted markers in vaginal secretion, whereas the high biochemical and histological similarity of oral and vaginal epithelia [40] makes the similarity of gene expression patterns between both tissues plausible. It should be pointed out that mRNA markers previously claimed to be useful for the identification of vaginal secretion such as HBD-1 [18] and MUC4 [18, 19] are known to be abundant also in oral epithelial cells and the salivary transcriptome [4143]. Furthermore, Nussbaumer et al. [19] ruled out the potential to differentiate saliva and vaginal secretion using solely MUC4. Our results, together with previous findings, suggest that establishing mRNA markers expressed exclusively in vaginal secretion could be a challenging if not impossible task.

Comparison with previously suggested mRNA markers

Interestingly, tissue-specific genes, as identified here, do not overlap with the ones previously suggested for blood and saliva stain identification [18, 19]. This could be explained by the experimental setup and the systematic (but not ad hoc) approach of this study, namely, the degraded biological material analysed and the Affymetrix microarray platform applied. In contrast to previous studies, we restricted our marker ascertainment to those genes, which retained structural mRNA integrity during the stain dry-out process as well as the subsequent long-term storage of 180 days. This allows future application of detection of these markers in forensic stains of unknown age, at least up to an age of 6 months, but expectedly longer. Furthermore, our saliva-specific candidate genes were derived from mouth and pharynx epithelial cells, unlike the previously suggested STATH and HTN3 genes that are expressed in the salivary gland [18]. Secreted mRNAs that are abundant in fresh saliva are more prone to fast degradation by extracellular RNAses [39]; they are therefore not expected to be present in dried stains, explaining why they were not detected by the relatively low-sensitive microarray hybridization method used in this study. The SPTB and PBGD genes, previously proposed as blood-specific markers [18], do not demonstrate any overexpression relative to other tissues in whole blood according to the GNF SymAtlas database (data not shown).

Conclusions

In summary, whole-genome expression analysis in time-wise degraded samples from blood and saliva stains in combination with RT-PCR verification of various forensically relevant body fluids has resulted in the identification of stable tissue-specific mRNA markers from five genes for saliva (SPRR3, SPRR1A, KRT4, KRT6A, and KRT13) and nine genes for whole blood (CASP1, AMICA1, C1QR1, ALOX5AP, AQP9, C5R1, NCF2, MNDA, and ARHGAP26). For the first time, mRNA markers were ascertained considering almost the entire human transcriptome and based on experimental data of genome-wide gene expression as well as considering the degradation stability of mRNAs. We could demonstrate that the candidate genes identified here provide informative mRNA markers for blood and saliva identification for stains up to 180 days of age. We would like to propose their application in forensic case work (with the potential practical limitation of coamplification in vaginal secret) for stains of at least 6 months of age. However, we expect that the proposed mRNA markers will successfully identify older blood and saliva stains (respective experiments are currently in progress). Finally, we would like to remark that tissue identification in forensics should be performed in a reciprocal way; so that a tissue is identified because of the presence of markers specific for the relevant tissue together with the absence of markers specific for all other tissues in question. Clearly, more research should be dedicated towards finding the most suitable markers for tissue identification in forensics.