相关文章推荐
兴奋的石榴  ·  ASP.NET MVC ...·  1 年前    · 
行走的打火机  ·  如何在 PostgreSQL 9.4+ ...·  2 年前    · 

目标: 对于具有架构的数据框

id:string
Cold:string
Medium:string
Hot:string
IsNull:string
annual_sales_c:string
average_check_c:string
credit_rating_c:string
cuisine_c:string
dayparts_c:string
location_name_c:string
market_category_c:string
market_segment_list_c:string
menu_items_c:string
msa_name_c:string
name:string
number_of_employees_c:string
number_of_rooms_c:string
Months In Role:integer
Tenured Status:string
IsCustomer:integer
units_c:string
years_in_business_c:string
medium_interactions_c:string
hot_interactions_c:string
cold_interactions_c:string
is_null_interactions_c:string
我想添加一个新列,它是列的所有键和值的JSON字符串。我在这篇文章PySpark中使用了这种方法- 逐行转换为JSON和相关问题。我的代码

df = df.withColumn("JSON",func.to_json(func.struct([df[x] for x in small_df.columns])))
我有一个问题:

问题: 当任何行的列具有空值(并且我的数据有许多...)时,Json字符串不包含该键。即如果27列中只有9列具有值,那么JSON字符串只有9个键...我想要做的是维护所有键但是对于空值只传递一个空字符串“”

sdf = spark.createDataFrame(data, ["A", "B", "C"])
sdf.printSchema()

|-- A: string (nullable = true)

|-- B: long (nullable = true)

|-- C: long (nullable = true)

使用when来实现IF-THEN-ELSE逻辑。如果列不为null,请使用该列。否则返回一个空字符串。

from pyspark.sql.functions import col, to_json, struct, when, lit
sdf = sdf.withColumn(

"JSON",
to_json(
    struct(
            when(
                col(x).isNotNull(),
                col(x)
            ).otherwise(lit("")).alias(x) 
            for x in sdf.columns

)
sdf.show()

+-----+----+---+-----------------------------+

|A |B |C |JSON |

+-----+----+---+-----------------------------+

|one |1 |10 |{"A":"one","B":"1","C":"10"} |

|null |2 |20 |{"A":"","B":"2","C":"20"} |

|three|null|30 |{"A":"three","B":"","C":"30"}|

|null |null|40 |{"A":"","B":"","C":"40"} |

+-----+----+---+-----------------------------+

另一种选择是使用pyspark.sql.functions.coalesce而不是when:

from pyspark.sql.functions import coalesce

sdf.withColumn(

"JSON",
to_json(
    struct(
       [coalesce(col(x), lit("")).alias(x) for x in sdf.columns]

).show(truncate=False)

Same as above

2019-07-17 23:18:20 企业邮箱发送邮件时,若出现投递失败产生退信,内容提示包含如下: the mta server of * reply:550 failed to meet SPF requirements 或者 the mta server of 163.com — 163mx01.mxmail.netease.com(220.181.14.141) reply:550 MI:SPF mx14,QMCowECpA0qTiftVaeB3Cg—.872S2 1442548128 http://mail.163.com/help 302314