Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Hello when I try to plot this code :
ggplot(subset(tabcourt, !is.na(Score) & !is.na(`PSA level (ng/ml)`)))
+facet_wrap(.~Method,scales='free')
+aes(x =Score, y =`PSA level (ng/ml)`,color=Method)
+stat_compare_means(show.legend=FALSE,label.x.npc = 0.5,label.y.npc = 0.93,color="black",size=4)
+geom_boxplot()+theme_bw()
It doesn't show the Kruskal Wallis on the middle plot, I tried all I could but can't seem to have the solution, any ideas on how to fix this ?
Edit : when putting free_y instead of free it fixes the bug but the x axis is bad (1 to 30 for each)
here's the head and the str of the data :
–
I think your error could come either how you wrapped your data into ggplot or from your data it self.
I don't have a sample of your data, so I used the sample database Toothgrowth
and your code for stat_compare_mean
, I get the display you are looking for.
Here is my code:
library(ggpubr)
data("ToothGrowth")
# Box plot faceted by "dose"
p <- ggboxplot(ToothGrowth, x = "supp", y = "len",
color = "supp", palette = "jco",
add = "jitter",
facet.by = "dose", short.panel.labs = FALSE)
# Adding stat_compare_means
p + stat_compare_means(show.legend=FALSE, label.x.npc = 0.5,
label.y.npc = 0.93, color = "black", size = 4) + theme_bw()
Here is the plot:
If you use this instead, you have a better plotting:
p + stat_compare_means() + theme_bw()
UPDATE: TRICK TO GET THE FINAL PLOT DISPLAYED
So, I tried to reproduce your data in order to reproduce the error of plotting you get and I succeed to plot the p values using a trick described in this post: R: ggplot2 - Kruskal-Wallis test per facet
Here is the code that I used to mimicks your data:
set.seed(1)
# defining the sample dataset AJCC
PSA_levels <- rnorm(100,mean = 2, sd = 2)
AJCC_data <- data.frame(cbind(PSA_levels))
x <- NULL
for(i in 1:100) {x <- c(x,sample(1:4,1))}
AJCC_data$score <- x
AJCC_data$Method <- 'AJCC'
# defining the sample dataset Gleason
PSA_levels <- rnorm(100,mean = 2.5, sd = 1)
Gleason_data <- data.frame(cbind(PSA_levels))
x <- NULL
for(i in 1:100) {x <- c(x,sample(5:10,1))}
Gleason_data$score <- x
Gleason_data$Method <- 'Gleason'
# defining the sample dataset TNM
PSA_levels <- rnorm(100,mean = 2.5, sd = 1)
TNM_data <- data.frame(cbind(PSA_levels))
x <- NULL
for(i in 1:100) {x <- c(x,sample(1:30,1))}
TNM_data$score <- x
TNM_data$Method <- 'TNM'
df <- rbind(AJCC_data, Gleason_data, TNM_data)
df$score <- as.factor(df$score)
Here is the output of df that looks similar to your data tabcourt
> str(df)
'data.frame': 300 obs. of 3 variables:
$ PSA_levels: num 0.747 2.367 0.329 5.191 2.659 ...
$ score : Factor w/ 30 levels "1","2","3","4",..: 2 1 2 2 2 3 1 2 3 3 ...
$ Method : chr "AJCC" "AJCC" "AJCC" "AJCC" ...
Then, I tried to reproduce your boxplot faceted:
library(ggplot2)
library(ggpubr)
g <- ggplot(df, aes(x = score, y = PSA_levels, color = Method))
p <- g + facet_wrap(.~Method, scales = 'free_x')
p <- p + geom_boxplot()
p <- p + theme_bw()
When, I tried to add p values on the graph using the stat_compare_means
function, I get same error of plotting as you. So, according to the post cited above, I used the package dplyr
to generate the pvalue of the Kruskal Wallis test for each group.
library(dplyr)
ptest <- df %>% group_by(Method) %>% summarize(p.value = kruskal.test(PSA_levels ~score)$p.value)
Here the output of ptest:
> ptest
# A tibble: 3 x 2
Method p.value
<chr> <dbl>
1 AJCC 0.575
2 Gleason 0.216
3 TNM 0.226
Now, I can add that the boxplot by doing:
p + geom_text(data = ptest, aes(x = c(2,3,10), y = c(6,6,6), label = paste0("Kruskal-Wallis\n p=",round(p.value,3))))
And here, what you get:
So, I think it is because stat_compare_means
did not understand which group to compare and how to represent all statistical comparisons on the graph. Doing the test out of the ggplot
and then adding as a geom_text
argument solve the situation.
Hope it will works with your real data !
–
–
–
–
–
thank you for this workaround !!! It did work, but I had to add : +scale_x_discrete()
otherwise I'd get Error: Discrete value supplied to continuous scale
Here's the code I used if this happens to others :
ptest = tabcourt %>% group_by(Method) %>%summarize(p.value=kruskal.test("mtDNA copy number"~Score)$p.value)
p2 = ggplotly(ggplot(subset(tabcourt, !is.na(Score) & !is.na("mtDNA copy number")),aes(x =Score, y ="mtDNA copy number",color=Method))
+ scale_x_discrete()
+ geom_text(data = ptest, aes(x =c(2,3,10), y= c(1.5,1.5,1.5), label = paste0("Kruskal-Wallis\n p=",round(p.value,3))))
+ facet_grid(.~Method,scales='free')
+ geom_boxplot()
+ theme_bw())
Weird tho that stat_compare_means have an hard time doing it's job !
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.