如何在PySpark中向DataFrame添加空map<string,string>类型的列?

2 人关注

I tried below code but its not working:

df=df.withColumn("cars", typedLit(Map.empty[String, String]))

Gives the error: NameError: name 'typedLit' is not defined

3 个评论
那是在使用前进口的吗?
@samkart 不,我不能为这个导入什么东西
我刚刚看到了标签 -- pyspark没有 typedLit ,但使用 array lit 可以实现类似的描述 here
python
apache-spark
pyspark
apache-spark-sql
Rahul Diggi
Rahul Diggi
发布于 2022-06-23
2 个回答
Steven
Steven
发布于 2022-06-23
已采纳
0 人赞同

创建一个空列,并将其铸成你需要的类型。

from pyspark.sql import functions as F, types as T
df = df.withColumn("cars", F.lit(None).cast(T.MapType(T.StringType(), T.StringType())))
df.select("cars").printSchema()
 |-- cars: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)
    
mazaneicha
mazaneicha
发布于 2022-06-23
0 人赞同

也许你可以用 pyspark.sql.functions.expr :

>>> from pyspark.sql.functions import *
>>> df.withColumn("cars",expr("map()")).printSchema()                                                                                                       
 |-- col1: string (nullable = true)
 |-- cars: map (nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = false)

EDIT:

If you'd like your map to have keys and/or values of a non-trivial type (not map<string,string> as your question's title says), some casting becomes unavoidable, I'm afraid. For example:

>>> df.withColumn("cars",create_map(lit(None).cast(IntegerType()),lit(None).cast(DoubleType()))).printSchema()                                      
 |-- col1: string (nullable = true)
 |-- cars: map (nullable = false)