如何使用grouping sets进行多维聚合_云原生大数据计算服务 MaxCompute(MaxCompute)-阿里云帮助中心

对于经常需要对数据进行多维度的聚合分析的场景，您既需要对 A 列做聚合，也要对 B 列做聚合，同时要对 A、B 两列做聚合，因此需要多次使用 union all 。您可以使用 grouping sets 快速解决此类问题。本文为您介绍如何使用 grouping sets 进行多维聚合。

功能介绍

grouping sets 是对 select 语句中 group by 子句的扩展，允许您采用多种方式对结果分组，而不必使用多个 select 语句再 union all 来实现。这样能够使 MaxCompute 的引擎给出更有效的执行计划，从而提高执行性能。

与 grouping sets 相关联的语法如下。

类型	说明
`cube`	特殊的 `grouping sets` ，将指定列的所有可能组合作为 `grouping sets` ，也可以与 `grouping sets` 组合使用。 `group by cube (a, b, c) --等效于以下语句。 grouping sets ((a,b,c),(a,b),(a,c),(b,c),(a),(b),(c),()) group by cube ( (a, b), (c, d) ) --等效于以下语句。 grouping sets ( ( a, b, c, d ), ( a, b ), ( c, d ), ( ) group by a, cube (b, c), grouping sets ((d), (e)) --等效于以下语句。 group by grouping sets ( (a, b, c, d), (a, b, c, e), (a, b, d), (a, b, e), (a, c, d), (a, c, e), (a, d), (a, e) )`
`rollup`	特殊的 `grouping sets` ，以按层级聚合的方式产生 `grouping sets` ，也可以与 `grouping sets` 组合使用。 `group by rollup (a, b, c) --等效价于以下语句。 grouping sets ((a,b,c),(a,b),(a), ()) group by rollup ( a, (b, c), d ) --等效于以下语句。 grouping sets ( ( a, b, c, d ), ( a, b, c ), ( a ), ( ) group by grouping sets((b), (c), rollup(a,b,c)) --等效于以下语句。 group by grouping sets ( (b), (c), (a,b,c), (a,b), (a), () )`
`grouping`	`grouping sets` 结果中使用 NULL 充当占位符，导致您会无法区分占位符 NULL 与数据中真正的 NULL。因此，MaxCompute 为您提供了 `grouping` 。 `grouping` 接受一个列名作为参数，如果结果对应行使用了参数列做聚合，返回 0，此时意味着 NULL 来自输入数据。否则返回 1，此时意味着 NULL 是 `grouping sets` 的占位符。
`grouping_id`	接受一个或多个列名作为参数。结果是将参数列的 `grouping` 结果按照 Bitmap 的方式组成整数。
`grouping__id`	`grouping__id` 不带参数，用于兼容 Hive 查询。此表达方式在 MaxCompute 中等价于 `grouping_id(group by 参数列表)` ，参数与 `group by` 的顺序一致。

GROUPING SETS 使用示例

grouping sets 使用示例如下：

准备数据。

create table requests lifecycle 20 as 
select * from values 
    (1, 'windows', 'PC', 'Beijing'),
    (2, 'windows', 'PC', 'Shijiazhuang'),
    (3, 'linux', 'Phone', 'Beijing'),
    (4, 'windows', 'PC', 'Beijing'),
    (5, 'ios', 'Phone', 'Shijiazhuang'),
    (6, 'linux', 'PC', 'Beijing'),
    (7, 'windows', 'Phone', 'Shijiazhuang') 
as t(id, os, device, city);

对数据进行分组。您可以通过如下两种方式进行分组：

使用多个 select 语句进行分组。

select NULL, NULL, NULL, count(*)
from requests
union all
select os, device, NULL, count(*)
from requests group by os, device
union all
select null, null, city, count(*)
from requests group by city;

使用 grouping sets 进行分组。

select os,device, city ,count(*)
from requests
group by grouping sets((os, device), (city), ());

返回结果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| NULL       | NULL       | Beijing    | 4          |
| NULL       | NULL       | Shijiazhuang | 3          |
| ios        | Phone      | NULL       | 1          |
| linux      | PC         | NULL       | 1          |
| linux      | Phone      | NULL       | 1          |
| windows    | PC         | NULL       | 3          |
| windows    | Phone      | NULL       | 1          |
+------------+------------+------------+------------+

select os,device, city, count(*)
from requests 
group by cube (os, device, city);
--等效于如下语句。
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| NULL       | NULL       | Beijing    | 4          |
| NULL       | NULL       | Shijiazhuang | 3          |
| NULL       | PC         | NULL       | 4          |
| NULL       | PC         | Beijing    | 3          |
| NULL       | PC         | Shijiazhuang | 1          |
| NULL       | Phone      | NULL       | 3          |
| NULL       | Phone      | Beijing    | 1          |
| NULL       | Phone      | Shijiazhuang | 2          |
| ios        | NULL       | NULL       | 1          |
| ios        | NULL       | Shijiazhuang | 1          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | NULL       | 2          |
| linux      | NULL       | Beijing    | 2          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | NULL       | 4          |
| windows    | NULL       | Beijing    | 2          |
| windows    | NULL       | Shijiazhuang | 2          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

select os,device, city, count(*) 
from requests 
group by cube ((os, device), (device, city));
--等效于如下语句。
select os,device, city, count(*) 
from requests 
group by grouping sets ((os, device, city),(os, device),(device,city),());

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| NULL       | PC         | Beijing    | 3          |
| NULL       | PC         | Shijiazhuang | 1          |
| NULL       | Phone      | Beijing    | 1          |
| NULL       | Phone      | Shijiazhuang | 2          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

select os,device, city, count(*)
from requests 
group by rollup (os, device, city);
--等效于如下语句。
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os),());

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| ios        | NULL       | NULL       | 1          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | NULL       | 2          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | NULL       | 4          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

select os,device, city, count(*)
from requests 
group by rollup (os, (os,device), city);
--等效于如下语句。
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os),());

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| ios        | NULL       | NULL       | 1          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | NULL       | 2          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | NULL       | 4          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

select os,device, city, count(*)
from requests 
group by os, cube(os,device), grouping sets(city);
--等效于如下语句。
select os,device, city, count(*)
from requests 
group by grouping sets((os,device,city),(os,city),(os,device,city));

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| ios        | NULL       | Shijiazhuang | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | Beijing    | 2          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | Beijing    | 2          |
| windows    | NULL       | Shijiazhuang | 2          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

select a,b,c,count(*),
grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid 
from values (1,2,3) as t(a,b,c)
group by cube(a,b,c);

+------------+------------+------------+------------+------------+------------+------------+------------+
| a          | b          | c          | _c3        | ga         | gb         | gc         | groupingid |
+------------+------------+------------+------------+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 1          | 1          | 1          | 1          | 7          |
| NULL       | NULL       | 3          | 1          | 1          | 1          | 0          | 6          |
| NULL       | 2          | NULL       | 1          | 1          | 0          | 1          | 5          |
| NULL       | 2          | 3          | 1          | 1          | 0          | 0          | 4          |
| 1          | NULL       | NULL       | 1          | 0          | 1          | 1          | 3          |
| 1          | NULL       | 3          | 1          | 0          | 1          | 0          | 2          |
| 1          | 2          | NULL       | 1          | 0          | 0          | 1          | 1          |
| 1          | 2          | 3          | 1          | 0          | 0          | 0          | 0          |
+------------+------------+------------+------------+------------+------------+------------+------------+

select
  if(grouping(os) == 0, os, 'ALL') as os,
  if(grouping(device) == 0, device, 'ALL') as device,
  if(grouping(city) == 0, city, 'ALL') as city, 
  count(*) as count 
from requests 
group by os, device, city grouping sets((os, device), (city), ());

+------------+------------+------------+------------+
| os         | device     | city       | count      |
+------------+------------+------------+------------+
| ALL        | ALL        | ALL        | 7          |
| ALL        | ALL        | Beijing    | 4          |
| ALL        | ALL        | Shijiazhuang | 3          |
| ios        | Phone      | ALL        | 1          |
| linux      | PC         | ALL        | 1          |
| linux      | Phone      | ALL        | 1          |
| windows    | PC         | ALL        | 3          |
| windows    | Phone      | ALL        | 1          |
+------------+------------+------------+------------+

set odps.sql.hive.compatible=true;
select      
a, b, c, count(*), grouping__id 
from values (1,2,3) as t(a,b,c) 
group by a, b, c grouping sets ((a,b,c), (a));
--等效于如下语句。
select      
a, b, c, count(*), grouping_id(a,b,c)  
from values (1,2,3) as t(a,b,c) 
group by a, b, c grouping sets ((a,b,c), (a));

+------------+------------+------------+------------+------------+
| a          | b          | c          | _c3        | _c4        |
+------------+------------+------------+------------+------------+
| 1          | NULL       | NULL       | 1          | 3          |
| 1          | 2          | 3          | 1          | 0          |
+------------+------------+------------+------------+------------+

GROUPING SETS

功能介绍

GROUPING SETS 使用示例

CUBE | ROLLUP 使用示例

GROUPING | GROUPING_ID 使用示例

GROUPING__ID 使用示例