相关文章推荐
不羁的面包  ·  Spring ...·  6 月前    · 
文章通过R语言的Seurat库处理PBMC3K数据集,为了减少计算时间,从2700个细胞中人工选取300个进行分析。接着,使用SCENIC工具进行基因共表达网络构建和调控模块识别。在处理过程中,进行了基因过滤、相关性计算、基因模块化分析等步骤,最终进行细胞类型聚类和可视化。 摘要由CSDN通过智能技术生成

由于pmbc3样本量太大,目前设备跑完需要大约14个小时,检测代码是否能够正常运行的时间成本较高,因此采用将pmbc3样本中的2700个细胞,人工选择300个进行测试。

运行代码如下。

setwd("D:/R/project/project1")
rm(list = ls()) 
library(Seurat) 
######-----使用不同的数据-----######
#安装SeuratData包
#install.packages("remotes")
#remotes::install_github("satijalab/seurat-data")
library(SeuratData)
#AvailableData()#SeuratData数据库中包含的数据集
#第一次一般要安装pbmc3k数据:InstallData("pbmc3k"),再引用
#InstallData("pbmc3k") 
data("pbmc3k") 
exprMat  <-  as.matrix(pbmc3k@assays$RNA@data)
####-----人工筛选的300细胞(直接选择矩阵前300列)-----####
exprMat <- exprMat[,1:300]
dim(exprMat)
exprMat[1:4,1:4] 
cellInfo <-  pbmc3k@meta.data[,c(4,2,3)]
#cellInfo变量也需要进行修改
cellInfo <- cellInfo[1:300,]
dim(cellInfo)
colnames(cellInfo)=c('CellType', 'nGene' ,'nUMI')
head(cellInfo)
table(cellInfo$CellType) 
####-----下面的部分和第一部分的相同-----####
### Initialize settings
library(SCENIC)
#报错:“object 'motifAnnotations_hgnc' not found“
data(list="motifAnnotations_hgnc_v9", package="RcisTarget")
motifAnnotations_hgnc <- motifAnnotations_hgnc_v9
# 保证cisTarget_databases 文件夹下面有下载好2个1G的文件,注意这个时候使用的是人的hg19的数据
scenicOptions <- initializeScenic(org="hgnc", dbDir="D:/R/project/project1/cisTarget_databases_hg19", nCores=1) 
saveRDS(scenicOptions, file="D:/R/project/project1/int/scenicOptions.Rds") 
### Co-expression network
genesKept <- geneFiltering(exprMat, scenicOptions)
exprMat_filtered <- exprMat[genesKept, ]
exprMat_filtered[1:4,1:4]
dim(exprMat_filtered)
runCorrelation(exprMat_filtered, scenicOptions)
exprMat_filtered_log <- log2(exprMat_filtered+1) 
runGenie3(exprMat_filtered_log, scenicOptions)
library(foreach)
library(iterators)
library(parallel)
library(doParallel)
scenicOptions@settings$dbs <- scenicOptions@settings$dbs["10kb"] # Toy run settings
scenicOptions <- runSCENIC_1_coexNetwork2modules(scenicOptions)
scenicOptions <- runSCENIC_2_createRegulons(scenicOptions, coexMethod=c("top5perTarget")) # Toy run settings
scenicOptions <- runSCENIC_3_scoreCells(scenicOptions, exprMat_filtered_log)
scenicOptions <- runSCENIC_4_aucell_binarize(scenicOptions)
tsneAUC(scenicOptions, aucType="AUC") # choose settings

运行log(可能存在一定出入,但关键点没有问题)如下:

> source("D:/R/project/project1/300cells.R", echo=TRUE)
> setwd("D:/R/project/project1")
> rm(list = ls()) 
> library(Seurat) 
The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
which was just loaded, will retire in October 2023.
Please refer to R-spatial evolution reports for details, especially
https://r-spatial.org/r/2023/05/15/evolution4.html.
It may be desirable to make the sf package available;
package maintainers should consider adding sf to Suggests:.
The sp package is now running under evolution status 2
     (status 2 uses the sf package in place of rgdal)
Attaching SeuratObject
> ######-----使用不同的数据-----######
> #安装SeuratData包
> #install.packages("remotes")
> #remotes::install_github("satijalab/seurat-data")
> library(SeuratDa .... [TRUNCATED] 
> #AvailableData()#SeuratData数据库中包含的数据集
> #第一次一般要安装pbmc3k数据:InstallData("pbmc3k"),再引用
> #InstallData("pbmc3k") 
> data("pbmc3k") 
> exprMat  <-  as.matrix(pbmc3k@assays$RNA@data)
> exprMat <- exprMat[,1:300]
> dim(exprMat)
[1] 13714   300
> exprMat[1:4,1:4] 
              AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG
AL627309.1                 0              0              0              0
AP006222.2                 0              0              0              0
RP11-206L10.2              0              0              0              0
RP11-206L10.9              0              0              0              0
> cellInfo <-  pbmc3k@meta.data[,c(4,2,3)]
> cellInfo <- cellInfo[1:300,]
> dim(cellInfo)
[1] 300   3
> colnames(cellInfo)=c('CellType', 'nGene' ,'nUMI')
> head(cellInfo)
                   CellType nGene nUMI
AAACATACAACCAC Memory CD4 T  2419  779
AAACATTGAGCTAC            B  4903 1352
AAACATTGATCAGC Memory CD4 T  3147 1129
AAACCGTGCTTCCG   CD14+ Mono  2639  960
AAACCGTGTATGCG           NK   980  521
AAACGCACTGGTAC Memory CD4 T  2163  781
> table(cellInfo$CellType)
 Naive CD4 T Memory CD4 T   CD14+ Mono            B        CD8 T FCGR3A+ Mono           NK           DC     Platelet 
          79           42           50           43           28           20           21            5            3 
> #b <- c()
> #i <- 1
> #for(i in 1:length(a))
> # if(a[i] == 0)
> #    b <- append(b,i)
> #exprMat <- exprMat[-b,]
>  .... [TRUNCATED] 
> #报错:“object 'motifAnnotations_hgnc' not found“
> data(list="motifAnnotations_hgnc_v9", package="RcisTarget")
> motifAnnotations_hgnc <- motifAnnotations_hgnc_v9
> # 保证cisTarget_databases 文件夹下面有下载好2个1G的文件,注意这个时候使用的是人的hg19的数据
> scenicOptions <- initializeScenic(org="hgnc", dbDir="D:/R/project/project1/cisTarget_ ..." ... [TRUNCATED] 
Motif databases selected: 
  hg19-500bp-upstream-7species.mc9nr.feather 
  hg19-tss-centered-10kb-7species.mc9nr.feather
The tzdb package is not installed. Timezones will not be available to Arrow compute functions.
Using the column 'features' as feature index for the ranking database.
Using the column 'features' as feature index for the ranking database.
> saveRDS(scenicOptions, file="D:/R/project/project1/int/scenicOptions.Rds") 
> ### Co-expression network
> genesKept <- geneFiltering(exprMat, scenicOptions)
Maximum value in the expression matrix: 419
Ratio of detected vs non-detected: 0.066
Number of counts (in the dataset units) per gene:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
    0.00     2.00     6.00    52.77    20.00 19117.00 
Number of cells in which each gene is detected:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    2.00    6.00   18.57   18.00  300.00 
Number of genes left after applying the following filters (sequential):
	5590	genes with counts per gene > 9
	5568	genes detected in more than 3 cells
Using the column 'features' as feature index for the ranking database.
	5141	genes available in RcisTarget database
Gene list saved in int/1.1_genesKept.Rds
> exprMat_filtered <- exprMat[genesKept, ]
> exprMat_filtered[1:4,1:4]
         AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG
NOC2L                 0              0              0              0
HES4                  0              0              0              0
ISG15                 0              0              1              9
TNFRSF18              0              2              0              0
> dim(exprMat_filtered)
[1] 5141  300
> runCorrelation(exprMat_filtered, scenicOptions)
> exprMat_filtered_log <- log2(exprMat_filtered+1) 
> runGenie3(exprMat_filtered_log, scenicOptions)
Using 461 TFs as potential regulators...
Running GENIE3 part 1
Running GENIE3 part 10
Running GENIE3 part 2
Running GENIE3 part 3
Running GENIE3 part 4
Running GENIE3 part 5
Running GENIE3 part 6
Running GENIE3 part 7
Running GENIE3 part 8
Running GENIE3 part 9
Finished running GENIE3.
> library(foreach)
> library(iterators)
> library(parallel)
> library(doParallel)
> #runSCENIC_1_coexNetwork2modules(scenicOptions)
> #runSCENIC_2_createRegulons(scenicOptions, coexMethod=c("top5perTarget"))
> #runSCENIC_3_scoreC .... [TRUNCATED] 
> scenicOptions <- runSCENIC_1_coexNetwork2modules(scenicOptions)
19:56	Creating TF modules
Number of links between TFs and targets (weight>=0.001): 1252165
nTFs          461
nTargets     5141
nGeneSets    3688
nLinks    2011914
> scenicOptions <- runSCENIC_2_createRegulons(scenicOptions, coexMethod=c("top5perTarget")) # Toy run settings
19:56	Step 2. Identifying regulons
载入程辑包:‘AUCell’
The following object is masked from ‘package:SCENIC’:
    plotEmb_rgb
Using the column 'features' as feature index for the ranking database.
tfModulesSummary:
top5perTarget  314
19:56	RcisTarget: Calculating AUC
Using the column 'features' as feature index for the ranking database.
Scoring database:  [Source file: hg19-tss-centered-10kb-7species.mc9nr.feather]
20:03	RcisTarget: Adding motif annotation
Number of motifs in the initial enrichment: 91794
Number of motifs annotated to the matching TF: 1105
20:03	RcisTarget: Pruning targets
Using the column 'features' as feature index for the ranking database.
20:11	Number of motifs that support the regulons: 1105
	Preview of motif enrichment saved as: output/Step2_MotifEnrichment_preview.html
> scenicOptions <- runSCENIC_3_scoreCells(scenicOptions, exprMat_filtered_log)
20:12	Step 3. Analyzing the network activity in each individual cell
	Number of regulons to evaluate on cells: 53
Biggest (non-extended) regulons: 
	 SPI1 (320g)
	 FOS (108g)
	 IRF7 (51g)
	 SPIB (49g)
	 TAF1 (44g)
	 XBP1 (31g)
	 CEBPD (28g)
	 RXRA (21g)
	 ELF2 (19g)
	 STAT1 (18g)
Quantiles for the number of genes detected by cell: 
(Non-detected genes are shuffled at the end of the ranking. Keep it in mind when choosing the threshold for calculating the AUC).
    min      1%      5%     10%     50%    100% 
 256.00  287.94  358.80  429.90  704.00 1834.00 
Warning in .AUCell_calcAUC(geneSets = geneSets, rankings = rankings, nCores = nCores,  :
  Using only the first 287.94 genes (aucMaxRank) to calculate the AUC.
20:12	Finished running AUCell.
20:12	Plotting heatmap...
20:12	Plotting t-SNEs...
> scenicOptions <- runSCENIC_4_aucell_binarize(scenicOptions)
Binary regulon activity: 36 TF regulons x 290 cells.
(51 regulons including 'extended' versions)
36 regulons are active in more than 1% (2.9) cells.
> tsneAUC(scenicOptions, aucType="AUC") # choose settings
[1] "int/tSNE_AUC_50pcs_30perpl.Rds"
Warning messages:
1: In runGenie3(exprMat_filtered_log, scenicOptions) :
  Only 461 (25%) of the 1839 TFs in the database were found in the dataset. Do they use the same gene IDs?
2: In RcisTarget::addLogo(tableSubset, motifCol = motifCol, dbVersion = dbVersion,  :
  There is no annotation version attribute in the input table (it has probably been loaded with an older version of the package).'v9' will be used as it was the old default,but we recommend to re-load the annotations and/or re-run the enrichment to make sure everything is consistent.
SCENIC(单细胞重组网络推断和聚类)是一种从单细胞RNA序列数据推断基因调控网络和细胞类型的计算方法。 该方法的描述和一些使用示例可在《。 当前在R(此存储库)和Python中有SCENIC的实现。 如果您不太喜欢使用R,我们建议您检查一下SCENIC(其中包含Nextflow工作流程)和Python / Jupyter笔记本,以轻松运行SCENIC (强烈建议您批量运行SCENIC或更大的数据集)。 然后,可以在R,Python或SCope(Web界面)中浏览任何实现的输出。 有关在R运行SCENIC的更多详细信息和安装说明,请参阅以下教程: 这些示例的输出位于: : 常见问题: 2021/03/26: 2020/06/26: 该SCENICprotocol包括Nextflow工作流程,并pySCENIC笔记本现在正式发布。 有关详细信息
关于测试遇到的initializationerror:method initializationerror not found,根据报错分几种情况 第一种是因为junit及相关包缺失或者是代码本身的问题 可以参考这篇文章. 这里建议大家下载依赖包.可以来这里 https://mvnrepository.com/ 下载依赖包的教程可以看看这里:教程. https://www.cnblogs.com/minixiong/p/11394007.html 第二种是我自己遇到的 Caused by: java.la
非常感谢博主的及时回复,解答了我几天的困惑。我们都在尝试为不同的机器学习计算SHAP值。 其次,这是个非常有价值的解决方法。如果不是博主的解答,许多人大概率会和我一样做许多弯路: 如果使用网上的shapper包,发现会报错(原理似乎是和Python有关,因为它会提示你配置py 的shap包,没错就是所有分析的源头),总之我想复现R help 上面的例子还是同样的错误————Error in as.data.frame.default(new_data) : RuntimeError: cannot coerce class ‘c("numpy.ndarray", "python.builtin.object")’ to a data.frame 我能力有限还处理不了。 最后博主提示我安装 R ibreakdown包之后,就顺利解决了。我个人猜测,这个包也含有shap函数。 我还发现DALEX 包也有个函数shap_ 能提供一些统计值,但不完全一样,给大家参考。 最后,博主解决了GPT毫无办法的问题,感谢感谢~表情包 SCENIC分析(三) fern01: 您好,想请教一下,我用的hg38,运行initializeScenic这一步的时候报错,说hg19的文件找不到,怎么解决呀 SCENIC分析(三) CSDN-Ada助手: 恭喜用户持续创作,并发布了第三篇博客!标题为“SCENIC分析(三)”确实引人入胜。您在前两篇博客中对SCENIC分析进行了深入解读,我很期待能够阅读到这篇的内容。 在下一步的创作中,或许您可以考虑进一步探索SCENIC分析的实际应用领域,或者分享一些与该方法相关的案例研究。这样一来,读者们可以更好地理解并应用这一分析方法。当然,这只是一个建议,希望能对您的创作有所帮助。再次恭喜您,期待您后续的博客! CSDN 正在通过评论红包奖励优秀博客,请看红包流:https://bbs.csdn.net/?type=4&header=0&utm_source=csdn_ai_ada_blog_reply3