执行并获取结果_云原生大数据计算服务 MaxCompute(MaxCompute)-阿里云帮助中心

前提条件

您需要提前完成以下步骤，用于操作本文中的示例：

准备示例表 pyodps_iris ，详情请参见 Dataframe 数据处理。
创建 DataFrame，详情请参见从 MaxCompute 表创建 DataFrame 。

延迟执行

DataFrame 上的所有操作并不会立即执行，只有当显式调用 execute 方法，或者调用立即执行的方法时（内部调用的也是 execute ），才会执行这些操作。立即执行的方法如下表所示。

方法	说明	返回值
persist	将执行结果保存到 MaxCompute 表。	PyODPS DataFrame
execute	执行并返回全部结果。	ResultFrame
head	查看开头 N 行数据，这个方法会执行所有结果，并取开头 N 行数据。	ResultFrame
tail	查看结尾 N 行数据，这个方法会执行所有结果，并取结尾 N 行数据。	ResultFrame
to_pandas	转换为 Pandas DataFrame 或者 Series，wrap 参数为 True 的时候，返回 PyODPS DataFrame 对象。	wrap 为 True 时，返回 PyODPS DataFrame。 wrap 为 False 时，返回 Pandas DataFrame。False 为默认值。
plot，hist，boxplot	画图有关。	不涉及

# 非交互环境执行，需手动调用execute方法
print(iris[iris.sepallength < 5][:5].execute())
# 交互环境执行，自动调用execute方法
print(iris[iris.sepallength < 5][:5])

   sepallength  sepalwidth  petallength  petalwidth         name
0          4.9         3.0          1.4         0.2  Iris-setosa
1          4.7         3.2          1.3         0.2  Iris-setosa
2          4.6         3.1          1.5         0.2  Iris-setosa
3          4.6         3.4          1.4         0.3  Iris-setosa
4          4.4         2.9          1.4         0.2  Iris-setosa

from odps import options
options.interactive = False
print(iris[iris.sepallength < 5][:5])

Collection: ref_0
  odps.Table
    name: hudi_mc_0612.`iris3`
    schema:
      sepallength           : double      # 片长度(cm)
      sepalwidth            : double      # 片宽度(cm)
      petallength           : double      # 瓣长度(cm)
      petalwidth            : double      # 瓣宽度(cm)
      name                  : string      # 种类
Collection: ref_1
  Filter[collection]
    collection: ref_0
    predicate:
      Less[sequence(boolean)]
        sepallength = Column[sequence(float64)] 'sepallength' from collection ref_0
        Scalar[int8]
Slice[collection]
  collection: ref_1
  stop:
    Scalar[int8]
      5

result = iris.head(3)
for r in result:
    print(list(r))

[4.9, 3.0, 1.4, 0.2, 'Iris-setosa']
[4.7, 3.2, 1.3, 0.2, 'Iris-setosa']
[4.6, 3.1, 1.5, 0.2, 'Iris-setosa']

```
# 返回Pandas DataFrame。
pd_df = iris.head(3).to_pandas()
# 返回使用Pandas后端的PyODPS DataFrame。
wrapped_df = iris.head(3).to_pandas(wrap=True)  
```

iris2 = iris[iris.sepalwidth < 2.5].persist('pyodps_iris')
print(iris2.head(5))

   sepallength  sepalwidth  petallength  petalwidth             name
0          4.5         2.3          1.3         0.3      Iris-setosa
1          5.5         2.3          4.0         1.3  Iris-versicolor
2          4.9         2.4          3.3         1.0  Iris-versicolor
3          5.0         2.0          3.5         1.0  Iris-versicolor
4          6.0         2.2          4.0         1.0  Iris-versicolor

iris3 = iris[iris.sepalwidth < 2.5].persist('pyodps_iris_test', partitions=['name'])
print(iris3.data)

odps.Table
  name: odps_test_sqltask_finance.`pyodps_iris`
  schema:
    sepallength           : double
    sepalwidth            : double
    petallength           : double
    petalwidth            : double
  partitions:
    name                  : string

print(iris[iris.sepalwidth < 2.5].persist('pyodps_iris_partition', partition='ds=test', drop_partition=True, create_partition=True).head(5))

   sepallength  sepalwidth  petallength  petalwidth             name    ds
0          4.5         2.3          1.3         0.3      Iris-setosa  test
1          5.5         2.3          4.0         1.3  Iris-versicolor  test
2          4.9         2.4          3.3         1.0  Iris-versicolor  test
3          5.0         2.0          3.5         1.0  Iris-versicolor  test
4          6.0         2.2          4.0         1.0  Iris-versicolor  test

print(iris[iris.sepalwidth < 2.5].persist('pyodps_iris', lifecycle=10).head(5))

   sepallength  sepalwidth  petallength  petalwidth             name
0          4.5         2.3          1.3         0.3      Iris-setosa
1          5.5         2.3          4.0         1.3  Iris-versicolor
2          4.9         2.4          3.3         1.0  Iris-versicolor
3          5.0         2.0          3.5         1.0  Iris-versicolor
4          6.0         2.2          4.0         1.0  Iris-versicolor

```
# 假设入口对象为o。
# 指定入口对象。
df.persist('table_name', odps=o)
# 或者可将入口对象标记为全局。
o.to_global()
df.persist('table_name')
```

print(type(iris[iris.sepalwidth < 2.5].to_pandas()))

<class 'pandas.core.frame.DataFrame'>

print(type(iris[iris.sepalwidth < 2.5].to_pandas(wrap=True)))

<class 'odps.df.core.DataFrame'>

print(iris[iris.sepallength < 5].to_pandas(hints={'odps.sql.mapper.split.size': 16}))

   sepallength  sepalwidth  petallength  petalwidth             name
0          4.5         2.3          1.3         0.3      Iris-setosa
1          4.9         2.4          3.3         1.0  Iris-versicolor

from odps import options
options.verbose = True
print(iris[iris.sepallength < 5].exclude('sepallength')[:5].execute())

Sql compiled:
SELECT t1.`sepalwidth`, t1.`petallength`, t1.`petalwidth`, t1.`name`
FROM odps_test_sqltask_finance.`pyodps_iris` t1
WHERE t1.`sepallength` < 5
LIMIT 5
Instance ID:
  Log view:http://logview
   sepalwidth  petallength  petalwidth             name
0         2.3          1.3         0.3      Iris-setosa
1         2.4          3.3         1.0  Iris-versicolor

my_logs = []
def my_logger(x):
    my_logs.append(x)
options.verbose_log = my_logger
print(iris[iris.sepallength < 5].exclude('sepallength')[:5].execute())
print(my_logs)

   sepalwidth  petallength  petalwidth             name
0         2.3          1.3         0.3      Iris-setosa
1         2.4          3.3         1.0  Iris-versicolor
['Sql compiled:', 'CREATE TABLE tmp_pyodps_24332bdb_4fd0_4d0d_aed4_38a443618268 LIFECYCLE 1 AS \nSELECT t1.`sepalwidth`, t1.`petallength`, t1.`petalwidth`, t1.`name` \nFROM odps_test_sqltask_finance.`pyodps_iris` t1 \nWHERE t1.`sepallength` < 5 \nLIMIT 5', 'Instance ID: 20230815034706122gbymevg*****', '  Log view:]

cached = iris[iris.sepalwidth < 3.5]['sepallength', 'name'].cache()
df = cached.head(3)
print(df)
# 返回结果
   sepallength             name
0          4.5      Iris-setosa
1          5.5  Iris-versicolor
2          4.9  Iris-versicolor
# 由于cached已经被计算，所以能立刻取到计算结果。
print(cached.head(3))
#返回结果
   sepallength             name
0          4.5      Iris-setosa
1          5.5  Iris-versicolor
2          4.9  Iris-versicolor

future = iris[iris.sepalwidth < 10].head(10, async_=True)
print(future.result())
# 返回结果
   sepallength  sepalwidth  petallength  petalwidth             name
0          4.5         2.3          1.3         0.3      Iris-setosa
1          5.5         2.3          4.0         1.3  Iris-versicolor
2          4.9         2.4          3.3         1.0  Iris-versicolor
3          5.0         2.0          3.5         1.0  Iris-versicolor
4          6.0         2.2          4.0         1.0  Iris-versicolor
5          6.2         2.2          4.5         1.5  Iris-versicolor
6          5.5         2.4          3.8         1.1  Iris-versicolor
7          5.5         2.4          3.7         1.0  Iris-versicolor
8          6.3         2.3          4.4         1.3  Iris-versicolor
9          5.0         2.3          3.3         1.0  Iris-versicolor

from odps.df import Delay
delay = Delay()  # 创建Delay对象。
df = iris[iris.sepal_width < 5].cache()  # 有一个共同的依赖。
future1 = df.sepal_width.sum().execute(delay=delay)  # 立即返回future对象，此时并没有执行。
future2 = df.sepal_width.mean().execute(delay=delay)
future3 = df.sepal_length.max().execute(delay=delay)
delay.execute(n_parallel=3)  # 并发度是3，此时才真正执行。
|==========================================|   1 /  1  (100.00%)        21s
print(future1.result())
# 返回结果
print(future2.result())
# 返回结果
2.272727272727273

执行并获取结果

前提条件

延迟执行

读取执行结果

保存执行结果为 MaxCompute 表

保存执行结果为 Pandas DataFrame

立即运行设置运行参数

运行时显示详细信息

缓存中间 Collection 计算结果

异步和并行执行

异步执行

并行执行