对于经常需要对数据进行多维度的聚合分析的场景,您既需要对
A
列做聚合,也要对
B
列做聚合,同时要对
A、B
两列做聚合,因此需要多次使用
union all
。您可以使用
grouping sets
快速解决此类问题。本文为您介绍如何使用
grouping sets
进行多维聚合。
功能介绍
grouping sets
是对
select
语句中
group by
子句的扩展,允许您采用多种方式对结果分组,而不必使用多个
select
语句再
union all
来实现。这样能够使
MaxCompute
的引擎给出更有效的执行计划,从而提高执行性能。
与
grouping sets
相关联的语法如下。
类型 |
说明 |
|
特殊的
|
|
特殊的
|
|
|
|
接受一个或多个列名作为参数。结果是将参数列的
|
|
说明
MaxCompute 和 Hive 2.3.0 及以上版本兼容该函数,在 Hive 2.3.0 以下版本中该函数输出不一致,因此并不推荐您使用此函数。 |
GROUPING SETS 使用示例
grouping sets
使用示例如下:
-
准备数据。
create table requests lifecycle 20 as select * from values (1, 'windows', 'PC', 'Beijing'), (2, 'windows', 'PC', 'Shijiazhuang'), (3, 'linux', 'Phone', 'Beijing'), (4, 'windows', 'PC', 'Beijing'), (5, 'ios', 'Phone', 'Shijiazhuang'), (6, 'linux', 'PC', 'Beijing'), (7, 'windows', 'Phone', 'Shijiazhuang') as t(id, os, device, city);
-
对数据进行分组。您可以通过如下两种方式进行分组:
-
使用多个
select
语句进行分组。select NULL, NULL, NULL, count(*) from requests union all select os, device, NULL, count(*) from requests group by os, device union all select null, null, city, count(*) from requests group by city;
-
使用
grouping sets
进行分组。select os,device, city ,count(*) from requests group by grouping sets((os, device), (city), ());
返回结果如下:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | NULL | NULL | Beijing | 4 | | NULL | NULL | Shijiazhuang | 3 | | ios | Phone | NULL | 1 | | linux | PC | NULL | 1 | | linux | Phone | NULL | 1 | | windows | PC | NULL | 3 | | windows | Phone | NULL | 1 | +------------+------------+------------+------------+
说明分组集中不使用的表达式,会使用 NULL 充当占位符,使得这些结果集可以做操作。例如结果第 4~8 行的 city 列。
-
CUBE | ROLLUP 使用示例
基于
grouping sets
示例表,
cube
及
rollup
使用示例如下:
-
示例 1:通过
cube
枚举 os、device、city 的所有可能列为grouping sets
。命令示例如下:select os,device, city, count(*) from requests group by cube (os, device, city); --等效于如下语句。 select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());
返回结果如下:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | NULL | NULL | Beijing | 4 | | NULL | NULL | Shijiazhuang | 3 | | NULL | PC | NULL | 4 | | NULL | PC | Beijing | 3 | | NULL | PC | Shijiazhuang | 1 | | NULL | Phone | NULL | 3 | | NULL | Phone | Beijing | 1 | | NULL | Phone | Shijiazhuang | 2 | | ios | NULL | NULL | 1 | | ios | NULL | Shijiazhuang | 1 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | linux | NULL | Beijing | 2 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | NULL | 4 | | windows | NULL | Beijing | 2 | | windows | NULL | Shijiazhuang | 2 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
-
示例 2:通过
cube
枚举(os, device),(device, city)
所有可能列为grouping sets
。命令示例如下:select os,device, city, count(*) from requests group by cube ((os, device), (device, city)); --等效于如下语句。 select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(device,city),());
返回结果如下:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | NULL | PC | Beijing | 3 | | NULL | PC | Shijiazhuang | 1 | | NULL | Phone | Beijing | 1 | | NULL | Phone | Shijiazhuang | 2 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
-
示例 3:通过
rollup
对 os、device、city 以按层级聚合的方式产生grouping sets
。命令示例如下:select os,device, city, count(*) from requests group by rollup (os, device, city); --等效于如下语句。 select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os),());
返回结果如下:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | ios | NULL | NULL | 1 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | NULL | 4 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
-
示例 4:通过
rollup
对os, (os,device), city
以按层级聚合的方式产生grouping sets
。命令示例如下:select os,device, city, count(*) from requests group by rollup (os, (os,device), city); --等效于如下语句。 select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os),());
返回结果如下:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | ios | NULL | NULL | 1 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | NULL | 4 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
-
示例 5:通过
group by
、cube
、grouping sets
组合产生grouping sets
。命令示例如下:select os,device, city, count(*) from requests group by os, cube(os,device), grouping sets(city); --等效于如下语句。 select os,device, city, count(*) from requests group by grouping sets((os,device,city),(os,city),(os,device,city));
返回结果如下:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | ios | NULL | Shijiazhuang | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | Beijing | 2 | | linux | PC | Beijing | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | Beijing | 2 | | windows | NULL | Shijiazhuang | 2 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
GROUPING | GROUPING_ID 使用示例
grouping
及
grouping_id
使用示例如下:
select a,b,c,count(*),
grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid
from values (1,2,3) as t(a,b,c)
group by cube(a,b,c);
返回结果如下:
+------------+------------+------------+------------+------------+------------+------------+------------+
| a | b | c | _c3 | ga | gb | gc | groupingid |
+------------+------------+------------+------------+------------+------------+------------+------------+
| NULL | NULL | NULL | 1 | 1 | 1 | 1 | 7 |
| NULL | NULL | 3 | 1 | 1 | 1 | 0 | 6 |
| NULL | 2 | NULL | 1 | 1 | 0 | 1 | 5 |
| NULL | 2 | 3 | 1 | 1 | 0 | 0 | 4 |
| 1 | NULL | NULL | 1 | 0 | 1 | 1 | 3 |
| 1 | NULL | 3 | 1 | 0 | 1 | 0 | 2 |
| 1 | 2 | NULL | 1 | 0 | 0 | 1 | 1 |
| 1 | 2 | 3 | 1 | 0 | 0 | 0 | 0 |
+------------+------------+------------+------------+------------+------------+------------+------------+
默认情况,
group by
列表中不被使用的列,会被填充为
NULL。您可以通过
grouping
输出更有实际意义的值。基于
grouping sets
示例表,命令示例如下:
select
if(grouping(os) == 0, os, 'ALL') as os,
if(grouping(device) == 0, device, 'ALL') as device,
if(grouping(city) == 0, city, 'ALL') as city,
count(*) as count
from requests
group by os, device, city grouping sets((os, device), (city), ());
返回结果如下:
+------------+------------+------------+------------+
| os | device | city | count |
+------------+------------+------------+------------+
| ALL | ALL | ALL | 7 |
| ALL | ALL | Beijing | 4 |
| ALL | ALL | Shijiazhuang | 3 |
| ios | Phone | ALL | 1 |
| linux | PC | ALL | 1 |
| linux | Phone | ALL | 1 |
| windows | PC | ALL | 3 |
| windows | Phone | ALL | 1 |
+------------+------------+------------+------------+
GROUPING__ID 使用示例
grouping__id
不带参数的命令示例如下:
set odps.sql.hive.compatible=true;
select
a, b, c, count(*), grouping__id
from values (1,2,3) as t(a,b,c)
group by a, b, c grouping sets ((a,b,c), (a));
--等效于如下语句。
select
a, b, c, count(*), grouping_id(a,b,c)
from values (1,2,3) as t(a,b,c)
group by a, b, c grouping sets ((a,b,c), (a));
返回结果如下:
+------------+------------+------------+------------+------------+
| a | b | c | _c3 | _c4 |
+------------+------------+------------+------------+------------+
| 1 | NULL | NULL | 1 | 3 |
| 1 | 2 | 3 | 1 | 0 |
+------------+------------+------------+------------+------------+