相关文章推荐
独立的夕阳  ·  wpf - ...·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to understand the connection between scale_fill_brewer and scale_fill_manual of package ggplot2 .

First, generate a ggplot with filled colors:

library(ggplot2)
p <- ggplot(data = mtcars, aes(x = mpg, y = wt, 
    group = cyl, fill = factor(cyl))) + 
    geom_area(position = 'stack')
# apply ready-made palette with scale_fill_brewer from ggplot2
p + scale_fill_brewer(palette = "Blues")

where 3 is the number of fill-colors in the data. For convenience, I have used the brewer.pal function of package RColorBrewer.

As far as I understand, the convenience of scale_fill_brewer is that it automatically computes the number of unique levels in the data (3 in this example). Here is my attempt at replicating:

p + scale_fill_manual(values = brewer.pal(length(levels(factor(mtcars$cyl))), "Blues"))

My question is: how does scale_fill_brewer compute the number of levels in the data?

I'm interested in understanding what else fill_color_brewer might be doing under the hood. Might I run into any difficulty if I replace the more user friendly fill_color_brewer with a more contorted implementation of scale_fill_manual like the one above.

Perusing the source code:

scale_fill_brewer
function (..., type = "seq", palette = 1) {
    discrete_scale("fill", "brewer", brewer_pal(type, palette), ...)

I couldn't see through this how scale_fill_brewer computes the number of unique levels in the data. Perhaps hidden in the ... ?

Edit: Where does the function scale_fill_brewer receive instructions to compute the number of levels in the data? Is it in "seq" or in ... or elsewhere?

The discrete_scale function is intricate and I'm lost. Here are its arguments:

discrete_scale <- function(aesthetics, scale_name, palette, name = NULL, 
    breaks = waiver(), labels = waiver(), legend = NULL, limits = NULL, 
    expand = waiver(), na.value = NA, drop = TRUE, guide="legend") {

Does any of this compute the number of levels?

I find ggplot2's source code kind of intricate, but you can start there: github.com/hadley/ggplot2/blob/master/R/scale-brewer.r – Roland Jan 14, 2015 at 9:50 Thanks Roland, that's where one finds scale_fill_brewer (copied in my question), but the mystery to me is how ..., type = "seq" makes the function count the number of levels (if that's what's going on). – PatrickT Jan 14, 2015 at 10:11 The next step is obviously to look up the definition of discrete_scale. Or is your question how the ellipses work in R? That has been explained numerous times on Stack Overflow. – Roland Jan 14, 2015 at 10:15 Thanks Roland. My question is how when you use scale_fill_brewer, as in the example above, the number of levels in the data is automatically computed: where does this come from? who does the work? Is it in the ... somehow or in the seq? How does scale_fill_brewer know where to look for levels and how to count? is that something discrete_scale does? EDITED the question to clarify, hopefully. – PatrickT Jan 14, 2015 at 10:21

The easiest way is to trace it is to think in terms of (1) setting up the plot data structure, and (2) resolving the aesthetics. It uses S3 so the branching is implicit

The setup call sequence

  • [scale-brewer.R] scale_fill_brewer(type="seq", palette="Blues")

  • [scale-.R] discrete_scale(...) - return an object representing the scale

  • range = DiscreteRange$new(), ## this is scales::DiscreteRange ...), , class = c(scale_name, "discrete", "scale"))

    The resolve call sequence

  • [plot-build.R] ggplot_build(plot) - for non-position scales, apply scales_train_df
  •     # Train and map non-position scales
        npscales <- scales$non_position_scales()       ## scales is plot$scales, S4 type Scales
        if (npscales$n() > 0) {
          lapply(data, scales_train_df, scales = npscales)
          data <- lapply(data, scales_map_df, scales = npscales)
    
  • [scales-.r] scales_train_df(...) - iterate again over scales$scales (list)

  • [scale-.r] scale_train_df(...) - iterate again

  • [scale-.r] scale_train(...) - S3 generic function

  • [scale-.r] scale_train.discrete(...) - almost there...

  • scales:::train_discrete(...) - again, almost there...

  • scales:::discrete_range(...) - still not there..

  • scales:::clevels(...) - there it is!

  • As of this point, scale$range has been overwritten by the levels of the factor. Unwinding the call stack to #1, we now call scales_map_df

  • [plot-build.R] ggplot_build(plot) - for non-position scales, apply scales_train_df
  •     # Train and map non-position scales
        npscales <- scales$non_position_scales()       ## scales is plot$scales, S4 type Scales
        if (npscales$n() > 0) {
          lapply(data, scales_train_df, scales = npscales)
          data <- lapply(data, scales_map_df, scales = npscales)
    
  • [scales-.r] scale_maps_df(...) - iterate

  • [scale-.r] scale_map_df(...) - iterate

  • [scale-.r] scale_map.discrete - fill up the palette (non-position scale!)

    scale_map.discrete <- function(scale, x, limits = scale_limits(scale)) { n <- sum(!is.na(limits)) pal <- scale$palette(n)

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.

  •