iScience
Volume 24, Issue 8, 20 August 2021, 102855
Journal home page for iScience

Article
A reference-free approach for cell type classification with scRNA-seq

https://doi.org/10.1016/j.isci.2021.102855Get rights and content
Under a Creative Commons license
open access

Highlights

  • Compressed k-mer groups (CKGs) are used to classify cell types without references

  • CKGs are competitive to gene expression features for cell type classification

  • CKGs are associated with genes sharing gene specific k-mers

Summary

Single-cell RNA sequencing (scRNA-seq) has become a revolutionary technology to characterize cells under different biological conditions. Unlike bulk RNA-seq, gene expression from scRNA-seq is highly sparse due to limited sequencing depth per cell. This is worsened by tossing away a significant portion of reads that attribute to gene quantification. To overcome data sparsity and fully utilize original reads, we propose scSimClassify, a reference-free and alignment-free approach to classify cell types with k-mer level features. The compressed k-mer groups (CKGs), identified by the simhash method, contain k-mers with similar abundance profiles and serve as the cells’ features. Our experiments demonstrate that CKG features lend themselves to better performance than gene expression features in scRNA-seq classification accuracy in the majority of experimental cases. Because CKGs are derived from raw reads without alignment to reference genome, scSimClassify offers an effective alternative to existing methods especially when reference genome is incomplete or insufficient to represent subject genomes.

Subject areas

bioinformatics
transcriptomics
algorithms

Cited by (0)

4

Lead contact