Institute for Computational Biomedicine, Bioquant, Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Heidelberg, Germany.
Joint Research Centre for Computational Biomedicine (JRC-COMBINE), RWTH Aachen University, Faculty of Medicine, Aachen, Germany.
Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia.
German Cancer Research Center (DKFZ), Heidelberg, Germany.
European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany.
Department of Biological Engineering, MIT, Cambridge, MA, USA.
CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.
Koch Institute for Integrative Cancer Biology, MIT, Cambridge, MA, USA.
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
Universitat Pompeu Fabra (UPF), Barcelona, Spain.
Faculty of Medicine, Department of Physiology, Semmelweis University, Budapest, Hungary.
背景 已经开发了许多功能分析工具来从大量转录组数据中提取功能和机制见解。随着单细胞 RNA 测序 (scRNA-seq) 的出现,原则上可以对单细胞进行这样的分析。然而,scRNA-seq 数据具有丢失事件和低库大小等特征。因此,尚不清楚为批量测序建立的功能性 TF 和通路分析工具能否以有意义的方式应用于 scRNA-seq。结果为了解决这个问题,我们对模拟和真实的 scRNA-seq 数据进行了基准研究。我们包括分别估计通路和转录因子 (TF) 活性的批量 RNA 工具 PROGENy、GO 富集和 DoRothEA,并将它们与为 scRNA-seq 设计的工具 SCENIC/AUCell 和 metaVIPER 进行比较。对于 in silico 研究,我们模拟来自 TF/通路扰动批量 RNA-seq 实验的单细胞。我们在 CRISPR 介导的敲除后用真实的 scRNA-seq 数据补充模拟数据。我们对模拟和真实数据的基准测试揭示了与原始批量数据相当的性能。此外,我们通过分析用 13 个 scRNA-seq 协议测序的混合物样本,表明 TF 和通路活动保留了细胞类型特异性的可变性。我们还提供基准数据供社区进一步使用。结论我们的分析表明,使用手动策划的足迹基因集的基于批量的功能分析工具可以应用于 scRNA-seq 数据,部分优于专用的单细胞工具。此外,
Background Many functional analysis tools have been developed to extract functional and mechanistic insight from bulk transcriptome data. With the advent of single-cell RNA sequencing (scRNA-seq), it is in principle possible to do such an analysis for single cells. However, scRNA-seq data has characteristics such as drop-out events and low library sizes. It is thus not clear if functional TF and pathway analysis tools established for bulk sequencing can be applied to scRNA-seq in a meaningful way. Results To address this question, we perform benchmark studies on simulated and real scRNA-seq data. We include the bulk-RNA tools PROGENy, GO enrichment, and DoRothEA that estimate pathway and transcription factor (TF) activities, respectively, and compare them against the tools SCENIC/AUCell and metaVIPER, designed for scRNA-seq. For the in silico study, we simulate single cells from TF/pathway perturbation bulk RNA-seq experiments. We complement the simulated data with real scRNA-seq data upon CRISPR-mediated knock-out. Our benchmarks on simulated and real data reveal comparable performance to the original bulk data. Additionally, we show that the TF and pathway activities preserve cell type-specific variability by analyzing a mixture sample sequenced with 13 scRNA-seq protocols. We also provide the benchmark data for further use by the community. Conclusions Our analyses suggest that bulk-based functional analysis tools that use manually curated footprint gene sets can be applied to scRNA-seq data, partially outperforming dedicated single-cell tools. Furthermore, we find that the performance of functional analysis tools is more sensitive to the gene sets than to the statistic used.