有没有可能在给定百分位数值的情况下画出matplotlib的boxplot,而不是原始输入?

25 人关注

在我看来, boxplot() 方法需要一连串的原始值(数字)作为输入,然后从中计算出百分位数来画出boxplot(s)。

我希望有一种方法,可以通过传递百分位数,得到相应的 boxplot

假设我已经运行了几个基准,对于每个基准我都测量了延迟(浮点值)。此外,我还预先计算了这些数值的百分位数。

因此,对于每个基准,我都有第25、50、75个百分位数以及最小和最大值。

现在,鉴于这些数据,我想画出基准的箱形图。

1 个评论
建议:你能不能抽象地提出这个问题?也就是说,不要说 "延迟",而是使用一些抽象的概念。我测量了一些实值,即浮点,我想计算百分比......"。
python
python-2.7
matplotlib
boxplot
percentile
Alex Averbuch
Alex Averbuch
发布于 2014-11-30
4 个回答
Vicariggio
Vicariggio
发布于 2017-12-07
已采纳
0 人赞同

截至2020年,有一个比公认答案中的方法更好的方法。

matplotlib.axes.Axes 类提供了一个 bxp 方法,它可以用来根据百分位数值绘制方框和晶须。只有离群值需要原始数据,而这是可选的。

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
boxes = [
        'label' : "Male height",
        'whislo': 162.6,    # Bottom whisker position
        'q1'    : 170.2,    # First quartile (25th percentile)
        'med'   : 175.7,    # Median         (50th percentile)
        'q3'    : 180.4,    # Third quartile (75th percentile)
        'whishi': 187.8,    # Top whisker position
        'fliers': []        # Outliers
ax.bxp(boxes, showfliers=False)
ax.set_ylabel("cm")
plt.savefig("boxplot.png")
plt.close()

这产生了以下图像。example boxplot

Raghav RV
Raghav RV
发布于 2017-12-07
0 人赞同

为了只用百分位数值和异常值(如果有的话)绘制箱形图,我做了一个 customized_box_plot 的函数,它基本上修改了基本箱形图中的属性(从一个很小的样本数据中生成),使其根据你的百分位数值来适应。

The customized_box_plot function

def customized_box_plot(percentiles, axes, redraw = True, *args, **kwargs):
    Generates a customized boxplot based on the given percentile values
    box_plot = axes.boxplot([[-9, -4, 2, 4, 9],]*n_box, *args, **kwargs) 
    # Creates len(percentiles) no of box plots
    min_y, max_y = float('inf'), -float('inf')
    for box_no, (q1_start, 
                 q2_start,
                 q3_start,
                 q4_start,
                 q4_end,
                 fliers_xy) in enumerate(percentiles):
        # Lower cap
        box_plot['caps'][2*box_no].set_ydata([q1_start, q1_start])
        # xdata is determined by the width of the box plot
        # Lower whiskers
        box_plot['whiskers'][2*box_no].set_ydata([q1_start, q2_start])
        # Higher cap
        box_plot['caps'][2*box_no + 1].set_ydata([q4_end, q4_end])
        # Higher whiskers
        box_plot['whiskers'][2*box_no + 1].set_ydata([q4_start, q4_end])
        # Box
        box_plot['boxes'][box_no].set_ydata([q2_start, 
                                             q2_start, 
                                             q4_start,
                                             q4_start,
                                             q2_start])
        # Median
        box_plot['medians'][box_no].set_ydata([q3_start, q3_start])
        # Outliers
        if fliers_xy is not None and len(fliers_xy[0]) != 0:
            # If outliers exist
            box_plot['fliers'][box_no].set(xdata = fliers_xy[0],
                                           ydata = fliers_xy[1])
            min_y = min(q1_start, min_y, fliers_xy[1].min())
            max_y = max(q4_end, max_y, fliers_xy[1].max())
        else:
            min_y = min(q1_start, min_y)
            max_y = max(q4_end, max_y)
        # The y axis is rescaled to fit the new box plot completely with 10% 
        # of the maximum value at both ends
        axes.set_ylim([min_y*1.1, max_y*1.1])
    # If redraw is set to true, the canvas is updated.
    if redraw:
        ax.figure.canvas.draw()
    return box_plot

使用逆向逻辑(代码在最后),我从这里提取了百分位数的值例子

>>> percentiles
(-1.0597368367634488, 0.3977683984966961, 1.0298955252405229, 1.6693981537742526, 3.4951447843464449)
(-0.90494930553559483, 0.36916539612108634, 1.0303658700697103, 1.6874542731392828, 3.4951447843464449)
(0.13744105279440233, 1.3300645202649739, 2.6131540656339483, 4.8763411136047647, 9.5751914834437937)
(0.22786243898199182, 1.4120860286080519, 2.637650402506837, 4.9067126578493259, 9.4660357513550899)
(0.0064696168078617741, 0.30586770128093388, 0.70774153557312702, 1.5241965711101928, 3.3092932063051976)
(0.007009744579241136, 0.28627373934008982, 0.66039691869500572, 1.4772725266672091, 3.221716765477217)
(-2.2621660374110544, 5.1901313713883352, 7.7178532139979357, 11.277744848353247, 20.155971739152388)
(-2.2621660374110544, 5.1884411864079532, 7.3357079047721054, 10.792299385806913, 18.842012119715388)
(2.5417888074435702, 5.885996170695587, 7.7271286220368598, 8.9207423361593179, 10.846938621419374)
(2.5971767318505856, 5.753551925927133, 7.6569980004033464, 8.8161056254143233, 10.846938621419374)

请注意,为了保持简短,我没有显示离群值向量,它将是每个百分位数的第6个元素。

还要注意的是,所有通常的附加kwargs/args都可以使用,因为它们只是被传递到boxplot方法里面。

>>> fig, ax = plt.subplots()
>>> b = customized_box_plot(percentiles, ax, redraw=True, notch=0, sym='+', vert=1, whis=1.5)
>>> plt.show()

boxplot方法返回一个字典,将boxplot的组件映射到所创建的各个matplotlib.lines.Line2D实例。

引用matplotlib.pyplot.boxplot文档中的话。

这个字典有以下的键(假设是垂直的boxplots)。

boxes:boxplot的主体,显示四分位数和中位数的置信区间(如果启用)。

中位数:每个盒子的中位数的水平线。

晶须:延伸到最极端的n个离群数据点的垂直线。 帽:晶须两端的水平线。

异常值:代表超出晶须的数据的点(异常值)。

平均值:代表平均值的点或线。

For 例子 observe the boxplot of a tiny sample data of [-9, -4, 2, 4, 9]

>>> b = ax.boxplot([[-9, -4, 2, 4, 9],])
{'boxes': [<matplotlib.lines.Line2D at 0x7fe1f5b21350>],
'caps': [<matplotlib.lines.Line2D at 0x7fe1f54d4e50>,
<matplotlib.lines.Line2D at 0x7fe1f54d0e50>],
'fliers': [<matplotlib.lines.Line2D at 0x7fe1f5b317d0>],
'means': [],
'medians': [<matplotlib.lines.Line2D at 0x7fe1f63549d0>],
'whiskers': [<matplotlib.lines.Line2D at 0x7fe1f5b22e10>,
             <matplotlib.lines.Line2D at 0x7fe20c54a510>]} 
>>> plt.show()

The matplotlib.lines.Line2D对象有两个方法,我将在我的函数中广泛使用。set_xdata ( or set_ydata)和get_xdata ( or get_ydata ).

使用这些方法,我们可以改变基础箱形图的组成线的位置,以符合你的百分位值(这就是customized_box_plot函数的作用)。在改变了组成线的位置后,你可以用figure.canvas.draw()重新绘制画布。

总结了从百分位数到各种Line2D对象的坐标的映射关系。

  • The max ( q4_end - end of 4th quartile ) corresponds to the top most cap Line2D object.
  • The min ( q1_start - start of the 1st quartile ) corresponds to the lowermost most cap Line2D object.
  • The median corresponds to the ( q3_start ) median Line2D object.
  • The 2 whiskers lie between the ends of the boxes and extreme caps ( q1_start and q2_start - lower whisker; q4_start and q4_end - upper whisker )
  • The box is actually an interesting n shaped line bounded by a cap at the lower portion. The extremes of the n shaped line correspond to the q2_start and the q4_start.
  • The Central x coordinates ( for multiple box plots are usually 1, 2, 3... )
  • The library automatically calculates the bounding x coordinates based on the width specified.
  • INVERSE FUNCTION TO RETRIEVE THE PERCENTILES FROM THE boxplot DICT:

    def get_percentiles_from_box_plots(bp):
        percentiles = []
        for i in range(len(bp['boxes'])):
            percentiles.append((bp['caps'][2*i].get_ydata()[0],
                               bp['boxes'][i].get_ydata()[0],
                               bp['medians'][i].get_ydata()[0],
                               bp['boxes'][i].get_ydata()[2],
                               bp['caps'][2*i + 1].get_ydata()[0],
                               (bp['fliers'][i].get_xdata(),
                                bp['fliers'][i].get_ydata())))
        return percentiles
    我之所以没有做一个完全自定义的boxplot方法,是因为,内置的盒式图提供的许多功能不能完全复制。

    另外,如果我可能不必要地解释了一些可能太明显的东西,请原谅我。

    yoni
    很好的回答。非常感谢你。
    遇到的三个小问题:(1)没有定义n_box(这很容易......)(2)如果你想传递没有传单的百分位数数据,循环失败(最好写成enumerate(percentiles)中的box_no, pdata,然后检查pdata的长度(3)如果你使用patch_artist=True(没有set_ydata方法),程序失败。
    maschu
    maschu
    发布于 2017-12-07
    0 人赞同

    这里是这个有用的程序的更新版本。直接设置顶点似乎对填充的盒子(patchArtist=True)和未填充的盒子都有效。

    def customized_box_plot(percentiles, axes, redraw = True, *args, **kwargs):
        Generates a customized boxplot based on the given percentile values
        n_box = len(percentiles)
        box_plot = axes.boxplot([[-9, -4, 2, 4, 9],]*n_box, *args, **kwargs) 
        # Creates len(percentiles) no of box plots
        min_y, max_y = float('inf'), -float('inf')
        for box_no, pdata in enumerate(percentiles):
            if len(pdata) == 6:
                (q1_start, q2_start, q3_start, q4_start, q4_end, fliers_xy) = pdata
            elif len(pdata) == 5:
                (q1_start, q2_start, q3_start, q4_start, q4_end) = pdata
                fliers_xy = None
            else:
                raise ValueError("Percentile arrays for customized_box_plot must have either 5 or 6 values")
            # Lower cap
            box_plot['caps'][2*box_no].set_ydata([q1_start, q1_start])
            # xdata is determined by the width of the box plot
            # Lower whiskers
            box_plot['whiskers'][2*box_no].set_ydata([q1_start, q2_start])
            # Higher cap
            box_plot['caps'][2*box_no + 1].set_ydata([q4_end, q4_end])
            # Higher whiskers
            box_plot['whiskers'][2*box_no + 1].set_ydata([q4_start, q4_end])
            # Box
            path = box_plot['boxes'][box_no].get_path()
            path.vertices[0][1] = q2_start
            path.vertices[1][1] = q2_start
            path.vertices[2][1] = q4_start
            path.vertices[3][1] = q4_start
            path.vertices[4][1] = q2_start
            # Median
            box_plot['medians'][box_no].set_ydata([q3_start, q3_start])
            # Outliers
            if fliers_xy is not None and len(fliers_xy[0]) != 0:
                # If outliers exist
                box_plot['fliers'][box_no].set(xdata = fliers_xy[0],
                                               ydata = fliers_xy[1])
                min_y = min(q1_start, min_y, fliers_xy[1].min())
                max_y = max(q4_end, max_y, fliers_xy[1].max())
            else:
                min_y = min(q1_start, min_y)
                max_y = max(q4_end, max_y)
            # The y axis is rescaled to fit the new box plot completely with 10% 
            # of the maximum value at both ends
            axes.set_ylim([min_y*1.1, max_y*1.1])
        # If redraw is set to true, the canvas is updated.
        if redraw:
            ax.figure.canvas.draw()
        return box_plot
        
    谢谢你。 如果有人想知道如何给博列表分配标签 this 答案很好地说明了这一点 tl;dr [替换代码0]
    Hagne
    Hagne
    发布于 2017-12-07
    0 人赞同

    下面是一个自下而上的方法,使用matplotlib的 vline Rectangle 和正常的 plot 函数来建立box_plot。

    def boxplot(df, ax=None, box_width=0.2, whisker_size=20, mean_size=10, median_size = 10 , line_width=1.5, xoffset=0,
                         color=0):
        """Plots a boxplot from existing percentiles.
        Parameters
        ----------
        df: pandas DataFrame
        ax: pandas AxesSubplot
            if to plot on en existing axes
        box_width: float
        whisker_size: float
            size of the bar at the end of each whisker
        mean_size: float
            size of the mean symbol
        color: int or rgb(list)
            If int particular color of property cycler is taken. Example of rgb: [1,0,0] (red)
        Returns
        -------
        f, a, boxes, vlines, whisker_tips, mean, median
        if type(color) == int:
            color = plt.rcParams['axes.prop_cycle'].by_key()['color'][color]
        if ax:
            a = ax
            f = a.get_figure()
        else:
            f, a = plt.subplots()
        boxes = []
        vlines = []
        xn = []
        for row in df.iterrows():
            x = row[0] + xoffset
            xn.append(x)
            # box
            y = row[1][25]
            height = row[1][75] - row[1][25]
            box = plt.Rectangle((x - box_width / 2, y), box_width, height)
            a.add_patch(box)
            boxes.append(box)
            # whiskers
            y = (row[1][95] + row[1][5]) / 2
            vl = a.vlines(x, row[1][5], row[1][95])
            vlines.append(vl)
        for b in boxes:
            b.set_linewidth(line_width)
            b.set_facecolor([1, 1, 1, 1])
            b.set_edgecolor(color)
            b.set_zorder(2)
        for vl in vlines:
            vl.set_color(color)
            vl.set_linewidth(line_width)
            vl.set_zorder(1)
        whisker_tips = []
        if whisker_size:
            g, = a.plot(xn, df[5], ls='')
            whisker_tips.append(g)
            g, = a.plot(xn, df[95], ls='')
            whisker_tips.append(g)
        for wt in whisker_tips:
            wt.set_markeredgewidth(line_width)
            wt.set_color(color)
            wt.set_markersize(whisker_size)
            wt.set_marker('_')
        mean = None
        if mean_size:
            g, = a.plot(xn, df['mean'], ls='')
            g.set_marker('o')
            g.set_markersize(mean_size)
            g.set_zorder(20)
            g.set_markerfacecolor('None')
            g.set_markeredgewidth(line_width)
            g.set_markeredgecolor(color)
            mean = g
        median = None
        if median_size:
            g, = a.plot(xn, df['median'], ls='')
            g.set_marker('_')
            g.set_markersize(median_size)
            g.set_zorder(20)
            g.set_markeredgewidth(line_width)
            g.set_markeredgecolor(color)
            median = g
        a.set_ylim(np.nanmin(df), np.nanmax(df))
        return f, a, boxes, vlines, whisker_tips, mean, median
    

    这是它在行动中的样子。

    import numpy as np
    import pandas as pd
    import matplotlib.pylab as plt
    nopts = 12
    df = pd.DataFrame()
    df['mean'] = np.random.random(nopts) + 7
    df['median'] = np.random.random(nopts) + 7
    df[5] = np.random.random(nopts) + 4
    df[25] = np.random.random(nopts) + 6
    df[75] = np.random.random(nopts) + 8