使用 Pandas 读取和写入数据 - Microsoft Fabric

将 Lakehouse 数据加载到笔记本中

将 Lakehouse 附加到 Microsoft Fabric 笔记本后，无需离开页面即可浏览存储的数据，只需点击即可将其读入笔记本。选择任何 Lakehouse 文件会显示“将数据加载”到 Spark 或 Pandas DataFrame 的选项。（还可以复制文件的完整 ABFS 路径或友好的相对路径。）

单击其中一个“加载数据”提示将生成一个代码单元，用于将该文件加载到笔记本中的 DataFrame 中。

将 Spark DataFrame 转换为 Pandas DataFrame

下面的命令演示了如何将 Spark DataFrame 转换为 Pandas DataFrame，以供参考。

# Replace "spark_df" with the name of your own Spark DataFrame
pandas_df = spark_df.toPandas() 
读取和写入各种文件格式
下面的代码示例记录了用于读取和写入各种文件格式的 Pandas 操作。
必须替换以下示例中的文件路径。 Pandas 支持相对路径（如下所示）和完整的 ABFS 路径。 可以根据上一步从接口检索和复制二者之一。
从 CSV 文件读取数据
import pandas as pd
# Read a CSV file from your Lakehouse into a Pandas DataFrame
# Replace LAKEHOUSE_PATH and FILENAME with your own values
df = pd.read_csv("/LAKEHOUSE_PATH/Files/FILENAME.csv")
display(df)
将数据作为 CSV 文件写入
import pandas as pd 
# Write a Pandas DataFrame into a CSV file in your Lakehouse
# Replace LAKEHOUSE_PATH and FILENAME with your own values
df.to_csv("/LAKEHOUSE_PATH/Files/FILENAME.csv") 
读取 Parquet 文件中的数据
import pandas as pd 
# Read a Parquet file from your Lakehouse into a Pandas DataFrame
# Replace LAKEHOUSE_PATH and FILENAME with your own values
df = pandas.read_parquet("/LAKEHOUSE_PATH/Files/FILENAME.parquet") 
display(df)
将数据作为 Parquet 文件写入
import pandas as pd 
# Write a Pandas DataFrame into a Parquet file in your Lakehouse
# Replace LAKEHOUSE_PATH and FILENAME with your own values
df.to_parquet("/LAKEHOUSE_PATH/Files/FILENAME.parquet") 
从 Excel 文件读取数据
import pandas as pd 
# Read an Excel file from your Lakehouse into a Pandas DataFrame
# Replace LAKEHOUSE_PATH and FILENAME with your own values
df = pandas.read_excel("/LAKEHOUSE_PATH/Files/FILENAME.xlsx") 
display(df) 
将数据作为 Excel 文件写入
import pandas as pd 
# Write a Pandas DataFrame into an Excel file in your Lakehouse
# Replace LAKEHOUSE_PATH and FILENAME with your own values
df.to_excel("/LAKEHOUSE_PATH/Files/FILENAME.xlsx") 
读取 JSON 文件中的数据
import pandas as pd 
# Read a JSON file from your Lakehouse into a Pandas DataFrame
# Replace LAKEHOUSE_PATH and FILENAME with your own values
df = pandas.read_json("/LAKEHOUSE_PATH/Files/FILENAME.json") 
display(df) 
将数据作为 JSON 文件写入
import pandas as pd 
# Write a Pandas DataFrame into a JSON file in your Lakehouse
# Replace LAKEHOUSE_PATH and FILENAME with your own values
df.to_json("/LAKEHOUSE_PATH/Files/FILENAME.json") 
使用 Data Wrangler 清理和准备数据
开始训练 ML 模型