#pyscenic 的3个步骤之 grn
pyscenic grn \
--num_workers 20 \
--output adj.sample.tsv \
--method grnboost2 \
sample.loom \
allTFs_mm.txt #转录因子文件,1839 个基因的名字列表
step2在Linux命令行中,不需要进入python软件内环境
conda activate pscenic_git
cd ~/silicosis/fibroblast_myofibroblast/pyscenic_analysis/
指定48个核心
pyscenic ctx adj.sample.tsv /home/data/t040413/datasets/pyscenic_datasets_v10/cistarget_feather/mm10/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather --annotations_fname /home/data/t040413/datasets/pyscenic_datasets_v10/motif2annotations/mm10/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl.txt --expression_mtx_fname sample.loom --mode "dask_multiprocessing" --output reg.csv --num_workers 48 --mask_dropouts
有时候\换行往往会出错 不知道为啥
pyscenic ctx \
adj.sample.tsv \
/home/data/t040413/datasets/pyscenic_datasets_v10/cistarget_feather/mm10/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
--annotations_fname /home/data/t040413/datasets/pyscenic_datasets_v10/motif2annotations/mm10/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl.txt \
--expression_mtx_fname sample.loom \
--mode "dask_multiprocessing" \
--output reg.csv \
--num_workers 3 \
--mask_dropouts
#下面代码成功!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!单行输出
pyscenic ctx adj.sample.tsv /home/data/t040413/datasets/pyscenic_datasets_v10/cistarget_feather/mm10/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather --annotations_fname /home/data/t040413/datasets/pyscenic_datasets_v10/motif2annotations/mm10/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl.txt --expression_mtx_fname sample.loom --mode "dask_multiprocessing" --output reg.csv --num_workers 3 --mask_dropouts
step3
pip install pandas==1.5.3 -i https://pypi.douban.com/simple/
conda activate pscenic_git
cd ~/silicosis/fibroblast_myofibroblast/pyscenic_analysis/
pyscenic aucell \
sample.loom \
reg.csv \
--output sample_SCENIC.loom \
--num_workers 3
usage: pyscenic ctx [-h] [-o OUTPUT] [-n] [--chunk_size CHUNK_SIZE]
[--mode {custom_multiprocessing,dask_multiprocessing,dask_cluster}] [-a] [-t]
[--rank_threshold RANK_THRESHOLD] [--auc_threshold AUC_THRESHOLD]
[--nes_threshold NES_THRESHOLD] [--min_orthologous_identity MIN_ORTHOLOGOUS_IDENTITY]
[--max_similarity_fdr MAX_SIMILARITY_FDR] --annotations_fname ANNOTATIONS_FNAME
[--num_workers NUM_WORKERS] [--client_or_address CLIENT_OR_ADDRESS]
[--thresholds THRESHOLDS [THRESHOLDS ...]]
[--top_n_targets TOP_N_TARGETS [TOP_N_TARGETS ...]]
[--top_n_regulators TOP_N_REGULATORS [TOP_N_REGULATORS ...]] [--min_genes MIN_GENES]
[--expression_mtx_fname EXPRESSION_MTX_FNAME] [--mask_dropouts]
[--cell_id_attribute CELL_ID_ATTRIBUTE] [--gene_attribute GENE_ATTRIBUTE] [--sparse]
module_fname database_fname [database_fname ...]
positional arguments:
module_fname The name of the file that contains the signature or the co-expression modules. The
following formats are supported: CSV or TSV (adjacencies), YAML, GMT and DAT
(modules)
database_fname The name(s) of the regulatory feature databases. Two file formats are supported:
feather or db (legacy).
options:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Output file/stream, i.e. a table of enriched motifs and target genes (csv, tsv) or
collection of regulons (yaml, gmt, dat, json).
-n, --no_pruning Do not perform pruning, i.e. find enriched motifs.
--chunk_size CHUNK_SIZE
The size of the module chunks assigned to a node in the dask graph (default: 100).
--mode {custom_multiprocessing,dask_multiprocessing,dask_cluster}
The mode to be used for computing (default: custom_multiprocessing).
-a, --all_modules Included positive and negative regulons in the analysis (default: no, i.e. only
positive).
-t, --transpose Transpose the expression matrix (rows=genes x columns=cells).
motif enrichment arguments:
--rank_threshold RANK_THRESHOLD
The rank threshold used for deriving the target genes of an enriched motif
(default: 5000).
--auc_threshold AUC_THRESHOLD
The threshold used for calculating the AUC of a feature as fraction of ranked genes
(default: 0.05).
--nes_threshold NES_THRESHOLD
The Normalized Enrichment Score (NES) threshold for finding enriched features
(default: 3.0).
motif annotation arguments:
--min_orthologous_identity MIN_ORTHOLOGOUS_IDENTITY
Minimum orthologous identity to use when annotating enriched motifs (default: 0.0).
--max_similarity_fdr MAX_SIMILARITY_FDR
Maximum FDR in motif similarity to use when annotating enriched motifs (default:
0.001).
--annotations_fname ANNOTATIONS_FNAME
The name of the file that contains the motif annotations to use.
computation arguments:
--num_workers NUM_WORKERS
The number of workers to use. Only valid if using dask_multiprocessing,
custom_multiprocessing or local as mode. (default: 144).
--client_or_address CLIENT_OR_ADDRESS
The client or the IP address of the dask scheduler to use. (Only required of
dask_cluster is selected as mode)
module generation arguments:
--thresholds THRESHOLDS [THRESHOLDS ...]
The first method to create the TF-modules based on the best targets for each
transcription factor (default: 0.75 0.90).
--top_n_targets TOP_N_TARGETS [TOP_N_TARGETS ...]
The second method is to select the top targets for a given TF. (default: 50)
--top_n_regulators TOP_N_REGULATORS [TOP_N_REGULATORS ...]
The alternative way to create the TF-modules is to select the best regulators for
each gene. (default: 5 10 50)
--min_genes MIN_GENES
The minimum number of genes in a module (default: 20).
--expression_mtx_fname EXPRESSION_MTX_FNAME
The name of the file that contains the expression matrix for the single cell
experiment. Two file formats are supported: csv (rows=cells x columns=genes) or
loom (rows=genes x columns=cells). (Only required if modules need to be generated)
--mask_dropouts If modules need to be generated, this controls whether cell dropouts (cells in
which expression of either TF or target gene is 0) are masked when calculating the
correlation between a TF-target pair. This affects which target genes are included
in the initial modules, and the final pruned regulon (by default only positive
regulons are kept (see --all_modules option)). The default value in pySCENIC 0.9.16
and previous versions was to mask dropouts when calculating the correlation;
however, all cells are now kept by default, to match the R version.
loom file arguments:
--cell_id_attribute CELL_ID_ATTRIBUTE
The name of the column attribute that specifies the identifiers of the cells in the
loom file.
--gene_attribute GENE_ATTRIBUTE
The name of the row attribute that specifies the gene symbols in the loom file.
--sparse If set, load the expression data as a sparse matrix. Currently applies to the grn
inference step only.
#https://mp.weixin.qq.com/s/ncSW8EXrpzqD-3b7uXy5Mg
getwd()
setwd("/home/data/t040413/silicosis/fibroblast_myofibroblast/pyscenic_analysis/")
getwd()
.libPaths(c("/home/data/t040413/R/yll/usr/local/lib/R/site-library", "/home/data/t040413/R/x86_64-pc-linux-gnu-library/4.2", "/usr/local/lib/R/library"))
#数据太大转到Linux服务器处理
.libPaths(c("/home/data/refdir/Rlib/", "/home/data/t040413/R/x86_64-pc-linux-gnu-library/4.2", "/usr/local/lib/R/library"))
library(Seurat)
#############for pyscenic https://mp.weixin.qq.com/s/ncSW8EXrpzqD-3b7uXy5Mg
load("/home/data/t040413/silicosis/fibroblast_myofibroblast/enrichments/subset.Rdata")
DimPlot(subset_data2,label = T)
#留作pyscenic
#注意矩阵一定要转置,不然会报错
getwd()
write.csv(t(as.matrix(subset_data2@assays$RNA@counts)),
file="/home/data/t040413/silicosis/fibroblast_myofibroblast/pyscenic_analysis/fibo_1000.csv")
##############################################################################################################
#https://blog.csdn.net/qq_52813185/article/details/130850037?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22130850037%22%2C%22source%22%3A%22qq_52813185%22%7D
#####################################3linux中使用命令行
#step1
#step1 pyscenic 的3个步骤之 grn
pyscenic grn \
--num_workers 20 \
--output adj.sample.tsv \
--method grnboost2 \
sample.loom \ #表达矩阵
allTFs_mm.txt #转录因子文件,1839 个基因的名字列表
#step2
#step2
pyscenic ctx \
adj.sample.tsv \
/home/data/t040413/datasets/pyscenic_datasets_v10/cistarget_feather/mm10/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
--annotations_fname /home/data/t040413/datasets/pyscenic_datasets_v10/motif2annotations/mm10/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl \
--expression_mtx_fname sample.loom \
--mode "dask_multiprocessing" \
--output reg.csv \
--num_workers 3 \
--mask_dropouts
#单行形式
pyscenic ctx adj.sample.tsv /home/data/t040413/datasets/pyscenic_datasets_v10/cistarget_feather/mm10/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather --annotations_fname /home/data/t040413/datasets/pyscenic_datasets_v10/motif2annotations/mm10/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl.txt --expression_mtx_fname sample.loom --mode "dask_multiprocessing" --output reg.csv --num_workers 3 --mask_dropouts
#step3
#step3
pyscenic aucell \
sample.loom \
reg.csv \
--output sample_SCENIC.loom \
--num_workers 3
########################################################################## sample_SCENIC.loom导入R里面进行可视化
load("/home/data/t040413/silicosis/fibroblast_myofibroblast/enrichments/subset.Rdata")
.libPaths(c("/home/data/refdir/Rlib/", "/home/data/t040413/R/x86_64-pc-linux-gnu-library/4.2", "/usr/local/lib/R/library"))
library(SCENIC)
packageVersion("SCENIC")
library(SCopeLoomR)
getwd() #"/home/data/t040413/silicosis/fibroblast_myofibroblast/pyscenic_analysis"
scenicLoomPath='/home/data/t040413/silicosis/fibroblast_myofibroblast/pyscenic_analysis/sample_SCENIC.loom'
loom <- open_loom(scenicLoomPath)
# Read information from loom file:
regulons_incidMat <- get_regulons(loom, column.attr.name="Regulons")
regulons <- regulonsToGeneLists(regulons_incidMat)
regulonAUC <- get_regulons_AUC(loom, column.attr.name="RegulonsAUC")
#sce=readRDS("./fibo_1000.rds")
sce=subset_data2
####################可视化
library(pheatmap)
n=t(scale(t( getAUC(regulonAUC[,] )))) # 'scale'可以对log-ratio数值进行归一化
n[n>2]=2
n[n< -2]= -2
n[1:4,1:4]
dim(n)
ac=data.frame(group= as.character( Idents(sce)))
head(ac)
dim(ac)
n[1:4,1:4]
n=n[,colnames(n) %in% colnames(sce)]
rownames(ac)=colnames(n)
table(ac)
cg=read.table('/home/data/t040413/datasets/pyscenic_datasets_v10/all_tf_list/allTFs_mm.txt')[,1]
cg #1860
library(stringr)
#cg_n=n[rownames(n) %in% cg,]
cg_n=n[str_split(string = rownames(n),pattern = '\\(',simplify = T)[,1] %in% cg,]
cg_n[1:4,1:4]
p=pheatmap( cg_n,
show_colnames =F,
show_rownames = T,
annotation_col=ac )
.libPaths(c("/home/data/t040413/R/yll/usr/local/lib/R/site-library", "/home/data/t040413/R/x86_64-pc-linux-gnu-library/4.2", "/usr/local/lib/R/library"))
library(ggplot2)
ggsave(filename = 'heatmap.pdf',plot = p,width = 20,height = 50,limitsize = FALSE)
ggsave(filename = 'heatmap2.pdf',plot = p,width = 20,height = 30,limitsize = FALSE)
table(ac$group)
pyscenic_results=ls()
getwd()
save(pyscenic_results,file ="/home/data/t040413/silicosis/fibroblast_myofibroblast/pyscenic_analysis/pyscenic_results.Rdata" )
load("/home/data/t040413/silicosis/fibroblast_myofibroblast/pyscenic_analysis/pyscenic_results.Rdata")
# 尊重作者,进行二分类!
table(ac$group)
#ac$group=ifelse(ac$group %in% c(2:5,7,9),'mCAF','iCAF')
ac$group=ifelse(ac$group %in% c("specialized fib","Universal fib"),'stem fib','iCAF')
pheatmap(cg_n,show_colnames =T,show_rownames = T,
annotation_col=ac)
p=pheatmap(cg_n,show_colnames =F,show_rownames = T,
annotation_col=ac)
.libPaths(c("/home/data/t040413/R/yll/usr/local/lib/R/site-library", "/home/data/t040413/R/x86_64-pc-linux-gnu-library/4.2", "/usr/local/lib/R/library"))
library(ggplot2)
ggsave(filename = 'heatmap_choose_regulon.pdf',plot = p,width = 20,height = 50,limitsize = FALSE)
pheatmap(cg_n,show_colnames =F,show_rownames = T,
annotation_col=ac,
filename = 'heatmap_choose_regulon.png')
dev.off()
getwd()
setwd("/home/data/t040413/silicosis/fibroblast_myofibroblast/pyscenic_analysis/")
getwd()
.libPaths(c("/home/data/t040413/R/yll/usr/local/lib/R/site-library", "/home/data/t040413/R/x86_64-pc-linux-gnu-library/4.2", "/usr/local/lib/R/library"))
#数据太大转到Linux服务器处理
.libPaths(c("/home/data/refdir/Rlib/", "/home/data/t040413/R/x86_64-pc-linux-gnu-library/4.2", "/usr/local/lib/R/library"))
library(Seurat)
#############for pyscenic https://mp.weixin.qq.com/s/ncSW8EXrpzqD-3b7uXy5Mg
load("/home/data/t040413/silicosis/fibroblast_myofibroblast/enrichments/subset.Rdata")
DimPlot(subset_data2,label = T)
#留作pyscenic
#注意矩阵一定要转置,不然会报错
getwd()
write.csv(t(as.matrix(subset_data2@assays$RNA@counts)),
file="/home/data/t040413/silicosis/fibroblast_myofibroblast/pyscenic_analysis/fibo_1000.csv")
##############################################################################################################
#https://blog.csdn.net/qq_52813185/article/details/130850037?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22130850037%22%2C%22source%22%3A%22qq_52813185%22%7D
#step1
#step2
#step3
############# sample_SCENIC.loom导入R里面进行可视化
load("/home/data/t040413/silicosis/fibroblast_myofibroblast/enrichments/subset.Rdata")
.libPaths(c("/home/data/refdir/Rlib/", "/home/data/t040413/R/x86_64-pc-linux-gnu-library/4.2", "/usr/local/lib/R/library"))
library(SCENIC)
packageVersion("SCENIC")
library(SCopeLoomR)
getwd() #"/home/data/t040413/silicosis/fibroblast_myofibroblast/pyscenic_analysis"
scenicLoomPath='/home/data/t040413/silicosis/fibroblast_myofibroblast/pyscenic_analysis/sample_SCENIC.loom'
loom <- open_loom(scenicLoomPath)
# Read information from loom file:
regulons_incidMat <- get_regulons(loom, column.attr.name="Regulons")
regulons <- regulonsToGeneLists(regulons_incidMat)
regulonAUC <- get_regulons_AUC(loom, column.attr.name="RegulonsAUC")
#sce=readRDS("./fibo_1000.rds")
sce=subset_data2
####################可视化
library(pheatmap)
n=t(scale(t( getAUC(regulonAUC[,] )))) # 'scale'可以对log-ratio数值进行归一化
n[n>2]=2
n[n< -2]= -2
n[1:4,1:4]
dim(n)
ac=data.frame(group= as.character( Idents(sce)))
head(ac)
n[1:4,1:4]
n=n[,colnames(n) %in% colnames(sce)]
rownames(ac)=colnames(n)
cg=read.table('/home/data/t040413/datasets/pyscenic_datasets_v10/all_tf_list/allTFs_mm.txt')[,1]
cg #1860
library(stringr)
#cg_n=n[rownames(n) %in% cg,]
cg_n=n[str_split(string = rownames(n),pattern = '\\(',simplify = T)[,1] %in% cg,]
p=pheatmap(cg_n,show_colnames =F,show_rownames = T,
annotation_col=ac)
.libPaths(c("/home/data/t040413/R/yll/usr/local/lib/R/site-library", "/home/data/t040413/R/x86_64-pc-linux-gnu-library/4.2", "/usr/local/lib/R/library"))
library(ggplot2)
ggsave(filename = 'heatmap.pdf',plot = p,width = 20,height = 50,limitsize = FALSE)
ggsave(filename = 'heatmap2.pdf',plot = p,width = 20,height = 30,limitsize = FALSE)
table(ac$group)
pyscenic_results=ls()
getwd()
save(pyscenic_results,file ="/home/data/t040413/silicosis/fibroblast_myofibroblast/pyscenic_analysis/pyscenic_results.Rdata" )
load("/home/data/t040413/silicosis/fibroblast_myofibroblast/pyscenic_analysis/pyscenic_results.Rdata")
# 尊重作者,进行二分类!
ac$group=ifelse(ac$group %in% c(2:5,7,9),'mCAF','iCAF')
pheatmap(cg_n,show_colnames =F,show_rownames = T,
annotation_col=ac)
pheatmap(cg_n,show_colnames =F,show_rownames = T,
annotation_col=ac,
filename = 'heatmap_choose_regulon.png')
dev.off()
转录因子(transcriptionfactor, TF)是直接作用于转录组上,调控DNA转录的蛋白质。它通过与DNA特定区域结合(TFBS/motif),促进(activator)或阻止(repressor)DNA的转录过程,了解转录因子对于解析细胞的功能及生命活动有重要作用
对亚群细分类分析,也可以对不同的实验组分析
Step 1 构建共表达网络
输入的数据是标准化的count矩阵(行是基因和列是细胞),从中找出TFs调节的基因构建共表达网络。因此需要先验知识TFs及其靶基因集合(可以从数据库下载)。
这个共表达网络只是基于TF和gene表达量相关性推测的,TF和gene之间是否现实存在调控关系还需要进一步确证。确证的方法主要从TF功能结构入手,从图1我们可以看出,TF是通过直接与DNA结合而发挥作用的,因此我们可以通过反向查看gene上是否存在TF结合的motif序列来验证TF与gene的靶向关系。
Step 2 motif富集分析
进行TF-motif富集分析,识别直接靶标。仅保留具有正确的上游调节子且显著富集的motif modules,并对它们进行过滤以除去缺乏motif支持的间接靶标。这些处理后的每个TF及其潜在的直接targets genes被称作一个regulon。
具体过滤过程,首先基于gene-motif数据库,每个motif对模块中所有基因进行累积,模块中的基因排名越靠前,累积曲线越高,曲线下面积 (AUC) 越大,表明motif在该模块中的富集程度越高,然后对每个模块选取显著富集的motif,并预测其靶基因,最终综合TF-genes模块和靶基因预测结果,构成一个包含了TF和靶基因的基因调控网络模块 (regulons)。
Regulon调控子:受同一个TF调控的一群基因的集合,即one Regulon = one TF + target genes
Step 3 AUCell对每个细胞的每个regulon活性进行打分
对于一个regulon来说,比较细胞间的AUCell得分可以鉴定出哪种细胞有显著更高的subnetwork活性。
原理:AUCell基于基因集(Regulons中所有基因)打分,所得到的分数即为AUC(Area Under Curve)表示Regulons在细胞中的“活性”。
打分过程是针对每个细胞,将细胞中所有基因按照表达量从高到低进行排序,根据Regulons中的基因在序列中的位置,计算累计曲线面积 (AUC)。
Step 4 AUCell分数二值化
由于不同regulons包含的基因不同,它们之间的AUC值不具有可比较性,因此基于AUC值在所有细胞中的双峰分布特征,增加了Rgulons“on/off”的概念,认为双峰之间的低谷为判断Regulons活性开放的阈值,如果AUC值小于阈值,则判定为该Regulons在该细胞中未开放,即未发挥调控作用。
Reference
https://mp.weixin.qq.com/s/5Ekozso2TddqOAdG8dc4XA
https://assignmentpoint.com/transcription-factor/
https://mp.weixin.qq.com/s/QehrC8a7kX9KuMfwKxI9Iw
https://mp.weixin.qq.com/s/AJmVF1mRYQcuG73iKmkWQg
https://mp.weixin.qq.com/s/pjWG1VyVvytKo2RNQzDcNg
首先说一下网上的各种解决方案,如下:
第一种: 说让在本地生成新的公钥,然后复制到github上的设置里的 SSH keys里保存即可。
解释: 首先,这个说法没错,但是网上说的都是本地电脑用ssh方法拉不下来代码,用这个办法,并不是我们所说的问题。实际上我们本地不管用ssh方式还是https方式拉取代码,都可以成功拉取的。(前提是本地已经有生成的公钥私钥,并已经将本地的公钥配置到了github上)。我们这里是服务器上拉不下来代码。
第二种: 使用命令 vim /etc/ssh/ssh_config ,找到#StrictHostKeyChecking ask去掉注释,并把ask改为no即可。即
可扩展的SCENIC工作流程,用于单细胞基因调控网络分析
该存储库描述了如何对单细胞数据运行pySCENIC基因调控网络推断分析以及基本的“最佳实践”表达分析。 这包括:
独立的Jupyter笔记本电脑,用于交互式分析
Nextflow DSL1工作流程,它提供了一种半自动化且简化的方法来运行这些步骤
pySCENIC安装,使用和下游分析的详细信息
另请参阅《自然规约》中的相关出版物: : 。
有关此协议中步骤的高级实现,请参阅 ,这是pySCENIC的Nextflow DSL2实现,具有用于表达式分析的全面且可自定义的管道。 这包括其他pySCENIC功能(多次运行,集成的基于主题和基于轨迹的regulon修剪,织机文件生成)。
实例探究
PBMC 10k数据集(10x基因组学)
完整的SCENIC分析,以及过滤,群集,可视化和SCope就绪的织机文件创建: |
SCENIC(单细胞重组网络推断和聚类)是一种从单细胞RNA序列数据推断基因调控网络和细胞类型的计算方法。
该方法的描述和一些使用示例可在《。
当前在R(此存储库)和Python中有SCENIC的实现。 如果您不太喜欢使用R,我们建议您检查一下SCENIC(其中包含Nextflow工作流程)和Python / Jupyter笔记本,以轻松运行SCENIC (强烈建议您批量运行SCENIC或更大的数据集)。 然后,可以在R,Python或SCope(Web界面)中浏览任何实现的输出。
有关在R运行SCENIC的更多详细信息和安装说明,请参阅以下教程:
这些示例的输出位于: :
常见问题:
2021/03/26:
2020/06/26:
该SCENICprotocol包括Nextflow工作流程,并pySCENIC笔记本现在正式发布。 有关详细信息