MaxCompute支持的子查询的定义与使用方法_云原生大数据计算服务 MaxCompute-阿里云帮助中心

功能介绍

子查询指在一个完整的查询语句之中，嵌套若干个不同功能的小查询，从而一起完成复杂查询的一种编写形式。MaxCompute支持的子查询包含如下几种：

基础子查询
IN SUBQUERY
NOT IN SUBQUERY
EXISTS SUBQUERY
NOT EXISTS SUBQUERY
SCALAR SUBQUERY

示例数据

为便于理解，本文为您提供源数据，基于源数据提供相关示例。创建表sale_detail，并添加数据，命令示例如下：

--创建一张分区表sale_detail。
create table if not exists sale_detail
shop_name     string,
customer_id   string,
total_price   double
partitioned by (sale_date string, region string);
--向源表增加分区。
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');
--向源表追加数据。
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);

查询分区表sale_detail中的数据，命令示例如下：

set odps.sql.allow.fullscan=true;
select * from sale_detail; 
--返回结果。
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
| null       | c5          | NULL        | 2014       | shanghai   |
| s6         | c6          | 100.4       | 2014       | shanghai   |
| s7         | c7          | 100.5       | 2014       | shanghai   |
+------------+-------------+-------------+------------+------------+

基础子查询

普通查询操作的对象是目标表，但是查询的对象也可以是另一个 select 语句，这种查询为子查询。在 from 子句中，子查询可以被当作一张表，与其他表或子查询进行 join 操作。 join 详情请参见 JOIN 。

命令格式

格式1

select <select_expr> from (<select_statement>) [<sq_alias_name>];

格式2

select (<select_statement>) from <table_name>;

参数说明
- select_expr ：必填。格式为 col1_name, col2_name, 正则表达式,... ，表示待查询的普通列、分区列或正则表达式。
- select_statement ：必填。子查询语句。如果子查询语句为格式2，子查询结果必须只有一行。格式请参见 SELECT语法。
- sq_alias_name ：可选。子查询的别名。
- table_name ：必填。目标表名称。

使用示例

示例1：使用格式1子查询语法。命令示例如下。

set odps.sql.allow.fullscan=true;
select * from (select shop_name from sale_detail) a;

返回结果如下：

+------------+
| shop_name  |
+------------+
| s1         |
| s2         |
| s3         |
| null       |
| s6         |
| s7         |
+------------+

示例2：使用格式2子查询语法。命令示例如下。

set odps.sql.allow.fullscan=true;
select (select * from sale_detail where shop_name='s1') from sale_detail;

返回结果如下：

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s1         | c1          | 100.1       | 2013       | china      |
| s1         | c1          | 100.1       | 2013       | china      |
| s1         | c1          | 100.1       | 2013       | china      |
| s1         | c1          | 100.1       | 2013       | china      |
| s1         | c1          | 100.1       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

示例3：使用格式1子查询语法，在 from 子句中，子查询可以被当做一张表，与其他的表或子查询进行 join 操作。命令示例如下。

--先新建一张表，再执行join操作。
create table shop as select shop_name,customer_id,total_price from sale_detail;
select a.shop_name, a.customer_id, a.total_price from
(select * from shop) a join sale_detail on a.shop_name = sale_detail.shop_name;

返回结果如下：

+------------+-------------+-------------+
| shop_name  | customer_id | total_price |
+------------+-------------+-------------+
| null       | c5          | NULL        |
| s6         | c6          | 100.4       |
| s7         | c7          | 100.5       |
| s1         | c1          | 100.1       |
| s2         | c2          | 100.2       |
| s3         | c3          | 100.3       |
+------------+-------------+-------------+

IN SUBQUERY

in subquery 与 left semi join 用法类似。

命令格式
- 格式1
```
select<select_expr1>from<table_name1>where<select_expr2>in(select<select_expr3>from<table_name2>);
--等效于leftsemijoin如下语句。
select<select_expr1>from<table_name1><alias_name1>leftsemijoin<table_name2><alias_name2>on<alias_name1>.<select_expr2>=<alias_name2>.<select_expr3>;
```
- ```
select<select_expr1>from<table_name1>where<select_expr2>in(select<select_expr3>from<table_name2>where
<table_name1>.<col_name>=<table_name2>.<col_name>);
```

set odps.sql.allow.fullscan=true;
select * from sale_detail where total_price in (select total_price from shop);

+-----------+-------------+-------------+-----------+--------+
| shop_name | customer_id | total_price | sale_date | region |
+-----------+-------------+-------------+-----------+--------+
| s1        | c1          | 100.1       | 2013      | china  |
| s2        | c2          | 100.2       | 2013      | china  |
| s3        | c3          | 100.3       | 2013      | china  |
| s6        | c6          | 100.4       | 2014      | shanghai |
| s7        | c7          | 100.5       | 2014      | shanghai |
+-----------+-------------+-------------+-----------+--------+

set odps.sql.allow.fullscan=true;
select * from sale_detail where total_price in (select total_price from shop where customer_id = shop.customer_id);

+-----------+-------------+-------------+-----------+--------+
| shop_name | customer_id | total_price | sale_date | region |
+-----------+-------------+-------------+-----------+--------+
| s1        | c1          | 100.1       | 2013      | china  |
| s2        | c2          | 100.2       | 2013      | china  |
| s3        | c3          | 100.3       | 2013      | china  |
| s6        | c6          | 100.4       | 2014      | shanghai |
| s7        | c7          | 100.5       | 2014      | shanghai |
+-----------+-------------+-------------+-----------+--------+

--为方便理解，此处重新构造示例数据。
create table if not exists t1(a bigint,b bigint,c bigint,d bigint,e bigint);
create table if not exists t2(a bigint,b bigint,c bigint,d bigint,e bigint);
insert into table t1 values (1,3,2,1,1),(2,2,1,3,1),(3,1,1,1,1),(2,1,1,0,1),(1,1,1,0,1);
insert into table t2 values (1,3,5,0,1),(2,2,3,1,1),(3,1,1,0,1),(2,1,1,0,1),(1,1,1,0,1);
--场景一：in后的表达式为简单的SELECT多列语句。
select a, b from t1 where (c, d) in (select a, b from t2 where e = t1.e);
--返回结果如下。
+------------+------------+
| a          | b          |
+------------+------------+
| 1          | 3          |
| 2          | 2          |
| 3          | 1          |
+------------+------------+
--场景二：in后的表达式使用聚合函数。
select a, b from t1 where (c, d) in (select max(a), b from t2 where e = t1.e group by b having max(a) > 0);
--返回结果如下。
+------------+------------+
| a          | b          |
+------------+------------+
| 2          | 2          |
+------------+------------+
--场景三：in后的表达式为常量。
select a, b from t1 where (c, d) in ((1, 3), (1, 1));
--返回结果如下。
+------------+------------+
| a          | b          |
+------------+------------+
| 2          | 2          |
| 3          | 1          |
+------------+------------+

- ```
select <select_expr1> from <table_name1> where <select_expr2> not in (select <select_expr2> from <table_name2>);
--等效于left anti join如下语句。
select <select_expr1> from <table_name1> <alias_name1> left anti join <table_name2> <alias_name2> on <alias_name1>.<select_expr1> = <alias_name2>.<select_expr2>;
```
- ```
select <select_expr1> from <table_name1> where <select_expr2> not in (select <select_expr2> from <table_name2> where <table_name2_colname> = <table_name1>.<colname>);
```

--创建一张新表shop1并追加数据。
create table shop1 as select shop_name,customer_id,total_price from sale_detail;
insert into shop1 values ('s8','c1',100.1);
select * from shop1 where shop_name not in (select shop_name from sale_detail);

+------------+-------------+-------------+
| shop_name  | customer_id | total_price |
+------------+-------------+-------------+
| s8         | c1          | 100.1       |
+------------+-------------+-------------+

set odps.sql.allow.fullscan=true;
select * from shop1 where shop_name not in (select shop_name from sale_detail where customer_id = shop1.customer_id);

+------------+-------------+-------------+
| shop_name  | customer_id | total_price |
+------------+-------------+-------------+
| s8         | c1          | 100.1       |
+------------+-------------+-------------+

set odps.sql.allow.fullscan=true;
select * from shop1 where shop_name not in (select shop_name from sale_detail) and total_price < 100.3;

+------------+-------------+-------------+
| shop_name  | customer_id | total_price |
+------------+-------------+-------------+
| s8         | c1          | 100.1       |
+------------+-------------+-------------+

--创建一张新表sale并追加数据。
create table if not exists sale
shop_name     string,
customer_id   string,
total_price   double
partitioned by (sale_date string, region string);
alter table sale add partition (sale_date='2013', region='china');
insert into sale partition (sale_date='2013', region='china') values ('null','null',null),('s2','c2',100.2),('s3','c3',100.3),('s8','c8',100.8);
set odps.sql.allow.fullscan=true;
select * from sale where shop_name not in (select shop_name from sale_detail);

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
+------------+-------------+-------------+------------+------------+

--为方便理解，此处重新构造示例数据。与IN SUBQUERY中的示例数据相同。
create table if not exists t1(a bigint,b bigint,c bigint,d bigint,e bigint);
create table if not exists t2(a bigint,b bigint,c bigint,d bigint,e bigint);
insert into table t1 values (1,3,2,1,1),(2,2,1,3,1),(3,1,1,1,1),(2,1,1,0,1),(1,1,1,0,1);
insert into table t2 values (1,3,5,0,1),(2,2,3,1,1),(3,1,1,0,1),(2,1,1,0,1),(1,1,1,0,1);
--场景一：not in后的表达式为简单的SELECT多列语句。
select a, b from t1 where (c, d) not in (select a, b from t2 where e = t1.e);
--返回结果如下。
+------------+------------+
| a          | b          |
+------------+------------+
| 2          | 1          |
| 1          | 1          |
+------------+------------+
--场景二：not in后的表达式使用聚合函数。
select a, b from t1 where (c, d) not in (select max(a), b from t2 where e = t1.e group by b having max(a) > 0);
--返回结果如下。
+------------+------------+
| a          | b          |
+------------+------------+
| 1          | 3          |
| 3          | 1          |
| 2          | 1          |
| 1          | 1          |
+------------+------------+
--场景三：not in后的表达式为常量。
select a, b from t1 where (c, d) not in ((1, 3), (1, 1));
--返回结果如下。
+------------+------------+
| a          | b          |
+------------+------------+
| 1          | 3          |
| 2          | 1          |
| 1          | 1          |
+------------+------------+

select <select_expr> from <table_name1> where exists (select <select_expr> from <table_name2> where <table_name2_colname> = <table_name1>.<colname>);

set odps.sql.allow.fullscan=true;
select * from sale_detail where exists (select * from shop where customer_id = sale_detail.customer_id);
--等效于以下语句。
select * from sale_detail a left semi join shop b on a.customer_id = b.customer_id;

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| null       | c5          | NULL        | 2014       | shanghai   |
| s6         | c6          | 100.4       | 2014       | shanghai   |
| s7         | c7          | 100.5       | 2014       | shanghai   |
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

select <select_expr> from <table_name1> where not exists (select <select_expr> from <table_name2> where <table_name2_colname> = <table_name1>.<colname>);

set odps.sql.allow.fullscan=true;
select * from sale_detail where not exists (select * from shop where shop_name = sale_detail.shop_name);
--等效于以下语句。
select * from sale_detail a left anti join shop b on a.shop_name = b.shop_name;

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
+------------+-------------+-------------+------------+------------+

```
select <select_expr> from <table_name1> where (<select count(*) from <table_name2> where <table_name2_colname> = <table_name1>.<colname>) <标量运算符> <scalar_value>;
--等效于以下语句。
select <table_name1>.<select_expr> from <table_name1> left semi join (select <colname>, count(*) from <table_name2> group by <colname> having count(*) <标量运算符> <scalar_value>) <table_name2> on <table_name1>.<colname> = <table_name2>.<colname>;
```

--允许的操作。
select * from t1 where (select count(*) from t2 where t1.a = t2.a) = 3; 
--不允许的操作，不能在子查询的select中引用外部查询的列。
select * from t1 where (select count(*) from t2 where (select count(*) from t3 where t3.a = t1.a) = 2) = 3;

--不能在子查询的select中引用。
select * from t1 where (select t1.b + count(*) from t2) = 3; 
--select返回列不允许引用外层的列。
select(selectcount(t1.a)fromt2wheret2.a=t1.a)fromt1;
select(selectt1.afromt2wheret2.a=t1.a)fromt1;

set odps.sql.allow.fullscan=true;
select * from shop where (select count(*) from sale_detail where sale_detail.shop_name = shop.shop_name) >= 1;

+------------+-------------+-------------+
| shop_name  | customer_id | total_price |
+------------+-------------+-------------+
| s1         | c1          | 100.1       |
| s2         | c2          | 100.2       |
| s3         | c3          | 100.3       |
| null       | c5          | NULL        |
| s6         | c6          | 100.4       |
| s7         | c7          | 100.5       |
+------------+-------------+-------------+

--为方便理解，此处重新构造示例数据。
create table if not exists ts(a bigint,b bigint,c double);
create table if not exists t(a bigint,b bigint,c double);
insert into table ts values (1,3,4.0),(1,3,3.0);
insert into table t values (1,3,4.0),(1,3,5.0);
--场景一：SELECT列为包含多列的SCALAR SUBQUERY表达式，只支持等值表达式。错误用法：select (select a, b from t where c > ts.c) as (a, b), a from ts;
select (select a, b from t where c = ts.c) as (a, b), a from ts;
--返回结果如下。
+------------+------------+------------+
| a          | b          | a2         |
+------------+------------+------------+
| 1          | 3          | 1          |
| NULL       | NULL       | 1          |
+------------+------------+------------+
--场景二：SELECT列为BOOLEAN表达式，只支持等值比较。错误用法：select (a,b) > (select a,b from ts where c = t.c) from t;
select (a,b) = (select a,b from ts where c = t.c) from t;
--返回结果如下。
+------+
| _c0  |
+------+
| true |
| false |
+------+
--场景三：where支持多列比较，只支持等值比较。错误用法：select * from t where (a,b) > (select a,b from ts where c = t.c);
select * from t where c > 3.0 and (a,b) = (select a,b from ts where c = t.c);
--返回结果如下。
+------------+------------+------------+
| a          | b          | c          |
+------------+------------+------------+
| 1          | 3          | 4.0        |
+------------+------------+------------+
select * from t where c > 3.0 or (a,b) = (select a,b from ts where c = t.c);
--返回结果如下。
+------------+------------+------------+
| a          | b          | c          |
+------------+------------+------------+
| 1          | 3          | 4.0        |
| 1          | 3          | 5.0        |
+------------+------------+------------+

功能介绍

示例数据

基础子查询

IN SUBQUERY

NOT IN SUBQUERY

EXISTS SUBQUERY

NOT EXISTS SUBQUERY

SCALAR SUBQUERY