R语言ggboxplot-一文掌握箱线图绘制所有细节

作者:白介素2
相关阅读:
R语言ggplot2绘制箱线图 R语言生存分析04-Cox比例风险模型诊断
R语言生存分析03-Cox比例风险模型
R语言生存分析-02-ggforest
R语言生存分析-01
ggpubr-专为学术绘图而生(二)
ggstatsplot-专为学术绘图而生(一)
R语言GEO数据挖掘01-数据下载及提取表达矩阵
R语言GEO数据挖掘02-解决GEO数据中的多个探针对应一个基因
R语言GEO数据挖掘03-limma分析差异基因
R语言GEO数据挖掘04-功能富集分析

如果没有时间精力学习代码,推荐了解: 零代码数据挖掘课程

说一个事,鉴于简书平台在信息传播方面有不足之处,应粉丝要求,白介素2的个人微信平台已经开启,继续聊临床与科研的故事,R语言,数据挖掘,文献阅读等内容。当然也不要期望过高,微信平台目前的定位是作为自己的读书笔记,如果对大家有帮助最好。如果感兴趣, 可以扫码关注下。

Sys.setlocale('LC_ALL','C')
load(file = "F:/Bioinfor_project/Breast/AS_research/AS/result/hubgene.Rdata")
head(data)
require(cowplot)
require(tidyverse)
require(ggplot2)
require(ggsci)
require(ggpubr)
mydata<-data %>% 
  ## 基因表达数据gather,gather的范围应调整
  gather(key="gene",value="Expression",CCL14:TUBB3) %>% 
  dplyr::select(ID,gene,Expression,everything()) 
head(mydata)  ## 每个基因作为一个变量的宽数据

创建带有pvalue的箱线图

  • 展示绘图细节控制
  • p <- ggboxplot(mydata, x = "group", y = "Expression",
              color = "group", palette = "jama",
              add = "jitter")
    #  Add p-value
    p + stat_compare_means()
    
    # Default method = "kruskal.test" for multiple groups
    ggboxplot(mydata, x = "gene", y = "Expression",
              color = "gene",add="jitter", palette = "jama")+
      stat_compare_means()
    # Change method to anova
    ggboxplot(mydata, x = "gene", y = "Expression",
              color = "gene", add="jitter", palette = "jama")+
      stat_compare_means(method = "anova")
    ## 指定自己想要的比较
    # Visualize: Specify the comparisons you want
    my_comparisons <- list( c("CCL14", "HBA1"), c("HBA1", "CCL16"), c("CCL16", "TUBB3") )
    ggboxplot(mydata, x = "gene", y = "Expression",
              color = "group",add = "jitter", palette = "jama")+ 
      stat_compare_means(comparisons = my_comparisons)#+ # Add pairwise comparisons p-value
      #stat_compare_means()     # Add global p-value
    
    compare_means(Expression ~ gene,  data = mydata, ref.group = "CCL14",
                  method = "t.test")
    # Visualize
    mydata %>% 
      filter(group=="TNBC") %>% # 筛选TNBC数据
    ggboxplot( x = "gene", y = "Expression",
              color = "gene",add = "jitter", palette = "nejm")+
      stat_compare_means(method = "anova")+      # Add global p-value
      stat_compare_means(label = "p.signif", method = "t.test",
                         ref.group = "CCL14")      
    
    ## 比较各个基因在TNBC与Normal表达
    compare_means( Expression ~ group, data = mydata, 
                  group.by = "gene")
    # Box plot facetted by "gene"
    p <- ggboxplot(mydata, x = "group", y = "Expression",
              color = "group", palette = "jco",
              add = "jitter",
              facet.by = "gene", short.panel.labs = FALSE)
    # Use only p.format as label. Remove method name.
    p + stat_compare_means(label = "p.format")
    

    将各个图绘制在一张图中

    p <- ggboxplot(mydata, x = "gene", y = "Expression",
              color = "group", palette = "nejm",
              add = "jitter")
    p + stat_compare_means(aes(group = group))
    
    head(ToothGrowth)
    compare_means(len ~ supp, data = ToothGrowth, 
                  group.by = "dose", paired = TRUE)
    # Box plot facetted by "dose"
    p <- ggpaired(ToothGrowth, x = "supp", y = "len",
              color = "supp", palette = "jama", 
              line.color = "gray", line.size = 0.4,
              facet.by = "dose", short.panel.labs = FALSE)
    # Use only p.format as label. Remove method name.
    p + stat_compare_means(label = "p.format", paired = TRUE)
    
    head(mydata)
    group_box<-function(group=group,data=mydata){
            p <- ggboxplot(mydata, x = "gene", y = "Expression",
              color = group, 
              palette = "nejm",
              add = "jitter")
    p + stat_compare_means(aes(group = group))
    group_box(group="PAM50",data = mydata)
    

    封装为函数命名为group_box

  • 功能:已经选定的基因绘制箱线图
  • 参数1:group分组变量,可以是自己所有感兴趣的变量
  • 参数2:mydata为整理好的清洁数据,gene为长数据(gather版本)
  • head(mydata)
    group_box<-function(group=group,data=mydata){
            p <- ggboxplot(mydata, x = "gene", y = "Expression",
              color = group, 
              palette = "nejm",
              add = "jitter")
    p + stat_compare_means(aes(group = group))
    group_box(group="PAM50",data = mydata)
    ## 封装函数
    gene_box<-function(gene="CCL14",group="group",data=usedata){
    p <- ggboxplot(data, x = group, y = gene,
              ylab = sprintf("Expression of %s",gene),
              xlab = group,
              color = group, 
              palette = "nejm",
              add = "jitter")
    p + stat_compare_means(aes(group = group))
    
    gene_box(gene="CCL14")
    
  • 封装函数+lapply批量绘制无敌
  • 在lapply中的函数参数设置,不在原函数中,而是直接放置在lapply中
  • do.call中参数1为函数,+c()包含原函数的参数设置,同样参数设置不在原函数中
  • require(gridExtra)
    head(data)
    ## 需要批量绘制的基因名
    name<-colnames(data)[3:6]
    ## 批量绘图
    p<-lapply(name,gene_box,group = "T_stage")
    ## 组图
    do.call(grid.arrange,c(p,ncol=2))