Skip to main content
Log in

Detection of Cell Separation-Induced Gene Expression Through a Penalized Deconvolution Approach

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Interest in studying genomics and transcriptomics at the single-cell level has been increasing. One of the keys to single-cell study is developing cell-sorting technology to separate cells according to their type. However, the process of cell isolation changes the cell microenvironment that affects gene activity, and this change in gene expression can affect the conclusion of the single-cell study. To address this, we propose a novel PEnalized deconvolution Analysis for Cell separation-induced Heterogeneity (PEACH). By adopting a Bayesian variable selection scheme, PEACH can simultaneously decompose cell-type-specific expression from bulk tissue and identify cell separation-induced differential expression (CSI-DE) genes. We validated PEACH by using four benchmark datasets and one in silico mixture dataset. In the real application, we used PEACH to analyze an immune-related disease dataset, a blood dataset, and a skin dataset, and we consistently identified immediate-early genes, ribosomal protein genes, and mitochondrial genes across the three datasets. Our study illustrates that genes sensitive to the cell-sorting process are biologically meaningful and nonnegligible, and it may provide new insights into single-cell studies for transcriptomic analysis. The model has been implemented in the R package “PEACH,” and the algorithm is available for download.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The methods reported in this article are implemented in the R package, “PEACH,” which is publicly available to perform the deconvolution analysis (https://github.com/AshTai/PEACH). The benchmark datasets are available under GEO (GSE11058, GSE19830, and GSE60424). The datasets in the real application were downloaded from GSE115898, GSE51984, and GSE60424.

References

  1. Ozsolak F, Milos PM (2011) RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12:87

    Google Scholar 

  2. Stegle O, Teichmann SA, Marioni JC (2015) Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 16:133

    Google Scholar 

  3. Bacher R, Kendziorski C (2016) Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol 17:63

    Google Scholar 

  4. Brennecke P et al (2013) Accounting for technical noise in single-cell RNA-seq experiments. Nat Methods 10:1093

    Google Scholar 

  5. Finak G et al (2015) MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol 16:278

    Google Scholar 

  6. Pierson E, Yau C (2015) ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol 16:241

    Google Scholar 

  7. Kharchenko PV, Silberstein L, Scadden DT (2014) Bayesian approach to single-cell differential expression analysis. Nat Methods 11:740

    Google Scholar 

  8. Vallejos CA, Richardson S, Marioni JC (2016) Beyond comparisons of means: understanding changes in gene expression at the single-cell level. Genome Biol 17:70

    Google Scholar 

  9. Hafemeister C, Satija R (2019) Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol 20:1–15

    Google Scholar 

  10. Vallejos CA, Risso D, Scialdone A, Dudoit S, Marioni JC (2017) Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat Methods 14:565

    Google Scholar 

  11. Richardson GM, Lannigan J, Macara IG (2015) Does FACS perturb gene expression? Cytometry A 87:166–175

    Google Scholar 

  12. van den Brink SC et al (2017) Single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations. Nat Methods 14:935

    Google Scholar 

  13. Ziegenhain C, Vieth B, Parekh S, Hellmann I, Enard W (2018) Quantitative single-cell transcriptomics. Brief Funct Genomics 17:220–232

    Google Scholar 

  14. Lacar B et al (2016) Nuclear RNA-seq of single neurons reveals molecular signatures of activation. Nat Commun 7:1–13

    Google Scholar 

  15. Wu YE, Pan L, Zuo Y, Li X, Hong W (2017) Detecting activated cell populations using single-cell RNA-seq. Neuron 96(313–329):e316

    Google Scholar 

  16. Zhu L, Lei J, Devlin B, Roeder K (2018) A unified statistical framework for single cell and bulk RNA sequencing data. Ann Appl Stat 12:609

    MathSciNet  MATH  Google Scholar 

  17. Poulin J-F, Tasic B, Hjerling-Leffler J, Trimarchi JM, Awatramani R (2016) Disentangling neural cell diversity using single-cell transcriptomics. Nat Neurosci 19:1131

    Google Scholar 

  18. Wang X, Park J, Susztak K, Zhang NR, Li M (2019) Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat Commun 10:1–9

    Google Scholar 

  19. Abbas AR, Wolslegel K, Seshasayee D, Modrusan Z, Clark HF (2009) Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. PLoS ONE 4:e6098

    Google Scholar 

  20. Du R, Carey V, Weiss ST (2019) deconvSeq: deconvolution of cell mixture distribution in sequencing data. Bioinformatics 35:5095–5102

    Google Scholar 

  21. Gong T, Szustakowski JD (2013) DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data. Bioinformatics 29:1083–1085

    Google Scholar 

  22. Racle J, de Jonge K, Baumgaertner P, Speiser DE, Gfeller D (2017) Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. Elife 6:e26476

    Google Scholar 

  23. Newman AM et al (2015) Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 12:453

    Google Scholar 

  24. Ogundijo OE, Wang X (2017) A sequential Monte Carlo approach to gene expression deconvolution. PLoS ONE 12:e0186167

    Google Scholar 

  25. Tai A-S, Tseng GC, Hsieh W-P (2021) BayICE: a Bayesian hierarchical model for semireference-based deconvolution of bulk transcriptomic data. Ann Appl Stat 15:391–411

    MathSciNet  MATH  Google Scholar 

  26. She Y, Owen AB (2011) Outlier detection using nonconvex penalized regression. J Am Stat Assoc 106:626–639

    MathSciNet  MATH  Google Scholar 

  27. Linsley PS, Speake C, Whalen E, Chaussabel D (2014) Copy number loss of the interferon gene cluster in melanomas is linked to reduced T cell infiltrate and poor patient prognosis. PLoS ONE 9:e109760

    Google Scholar 

  28. Ahn RS et al (2017) Transcriptional landscape of epithelial and immune cell populations revealed through FACS-seq of healthy human skin. Sci Rep 7:1–9

    Google Scholar 

  29. Pabst C et al (2016) GPR56 identifies primary human acute myeloid leukemia cells with high repopulating potential in vivo. Blood J Am Soc Hematol 127:2018–2027

    Google Scholar 

  30. Jin H, Wan Y-W, Liu Z (2017) Comprehensive evaluation of RNA-seq quantification methods for linearity. BMC Bioinform 18:117

    Google Scholar 

  31. Zhong Y, Liu Z (2012) Gene expression deconvolution in linear space. Nat Methods 9:8

    Google Scholar 

  32. Fridman WH, Pagès F, Sautes-Fridman C, Galon J (2012) The immune contexture in human tumours: impact on clinical outcome. Nat Rev Cancer 12:298–306

    Google Scholar 

  33. Tai A-S, Peng C-H, Peng S-C, Hsieh W-P (2018) Decomposing the subclonal structure of tumors with two-way mixture models on copy number aberrations. PLoS ONE 13:e0206579

    Google Scholar 

  34. Shen-Orr SS et al (2010) Cell type–specific gene expression differences in complex tissues. Nat Methods 7:287–289

    Google Scholar 

  35. Ali AT, Boehme L, Carbajosa G, Seitan VC, Small KS, Hodgkinson A (2019) Nuclear genetic regulation of the human mitochondrial transcriptome. Elife 8:e41927

    Google Scholar 

  36. Genuth NR, Barna M (2018) The discovery of ribosome heterogeneity and its implications for gene regulation and organismal life. Mol Cell 71:364–374

    Google Scholar 

  37. Guimaraes JC, Zavolan M (2016) Patterns of ribosomal protein expression specify normal and malignant human cells. Genome Biol 17:236

    Google Scholar 

  38. Ilicic T, Kim JK, Kolodziejczyk AA, Bagger FO, McCarthy DJ, Marioni JC, Teichmann SA (2016) Classification of low quality cells from single-cell RNA-seq data. Genome Biol 17:29

    Google Scholar 

  39. Petukhov V, Guo J, Baryawno N, Severe N, Scadden DT, Samsonova MG, Kharchenko PV (2018) dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments. Genome Biol 19:78

    Google Scholar 

  40. Martano G et al (2019) Metabolism of stem and progenitor cells: proper methods to answer specific questions. Front Mol Neurosci 12:151

    Google Scholar 

  41. Akaishi T, Takahashi T, Nakashima I (2018) Peripheral blood monocyte count at onset may affect the prognosis in multiple sclerosis. J Neuroimmunol 319:37–40

    Google Scholar 

  42. Roep BO (2003) The role of T-cells in the pathogenesis of Type 1 diabetes: from cause to cure. Diabetologia 46:305–321

    Google Scholar 

  43. Delong T et al (2016) Pathogenic CD4 T cells in type 1 diabetes recognize epitopes formed by peptide fusion. Science 351:711–714

    Google Scholar 

  44. Shen XF, Cao K, Jp J, Guan WX, Du JF (2017) Neutrophil dysregulation during sepsis: an overview and update. J Cell Mol Med 21:1687–1697

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Ministry of Science and Technology [MOST 107-2118-M-007-001]. This manuscript was edited by Wallace Academic Editing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen-Ping Hsieh.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Appendices

Appendix 1

Simulation

See Fig. 9.

Fig. 9
figure 9

Boxplot for the result of imbalanced cellular proportions. The x-axis is the parameter of the Dirichlet distribution for the cell proportion of each mixture, and the y-axis represents the biases

Validation

See Figs. 10, 11, and 12.

Fig. 10
figure 10

Scatter plot of PEACH, EPIC, and CIBERSORT for validation on GSE11058. The x-axis is the estimated proportion of a particular cell type in a sample, and the y-axis represents the corresponding cell percentage obtained by cell sorting

Fig. 11
figure 11

Scatter plot of PEACH, EPIC, and CIBERSORT for validation on GSE19830. The x-axis is the estimated proportion of a particular tissue type in a sample, and the y-axis represents the corresponding cell percentage obtained by cell sorting

Fig. 12
figure 12

Scatter plot of PEACH, EPIC, and CIBERSORT for validation on a silico mixed dataset simulated by Gong and Szustakowski [21]. The x-axis is the estimated proportion of a particular cell type in a sample, and the y-axis represents the corresponding cell percentage obtained by cell sorting

Appendix 2

See Tables 2, 3, and 4.

Table 2 Sample size of immune-related dataset
Table 3 Sample distribution of the skin dataset
Table 4 Sample distribution of the blood dataset

Normalization

The read count data mentioned above were normalized by the weighted trimmed mean of M-values procedure implemented by the edgeR package.

Appendix 3

See Figs. 13 and 14.

Fig. 13
figure 13

Boxplot of estimated cell proportions of skin tissue. Each box represents the distribution of cell proportions estimated by PEACH under different conditions for different cell types

Fig. 14
figure 14

Boxplot of estimated cell proportions of normal blood tissue. Each box represents the distribution of cell proportions estimated by PEACH under different conditions for different cell types

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tai, AS., Wang, CC. & Hsieh, WP. Detection of Cell Separation-Induced Gene Expression Through a Penalized Deconvolution Approach. Stat Biosci 15, 692–718 (2023). https://doi.org/10.1007/s12561-022-09344-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-022-09344-8

Keywords

Navigation