Hi All,
I'm trying to add a column to a dataframe based on multiple check condition, one of the operation that we are doing is we need to take sum of rows, but im getting Below error:
Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.Dataset [StorageDayCountBeore: double]
at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77)
at org.apache.spark.sql.catalyst.expressions.Literal$anonfun$create$2.apply(literals.scala:163)
at org.apache.spark.sql.catalyst.expressions.Literal$anonfun$create$2.apply(literals.scala:163)
at scala.util.Try.getOrElse(Try.scala:79)
at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:162)
at org.apache.spark.sql.functions$.typedLit(functions.scala:112)
at org.apache.spark.sql.functions$.lit(functions.scala:95)
at MYDev.ReconTest$.main(ReconTest.scala:35)
at MYDev.ReconTest.main(ReconTest.scala)
and the Query im using is:
var df = inputDf
df = df.persist()
inputDf = inputDf.withColumn("newColumn",
when(df("MinBusinessDate") < "2018-08-8" && df("MaxBusinessDate") > "2018-08-08",
lit(df.groupBy(df("tableName"),df("runDate"))
.agg(sum(when(df("business_date") > "2018-08-08", df("rowCount")))
.alias("finalSRCcount"))
.drop("tableName","runDate"))))
Cheers,
MJ
Hi Issue got resolved,
i'm trying to perform Group by operation inside a Columns literal, group by itself will produce a new columns instead writing a query like i asked above we have to change our query accordingly as follow.
inputDf = inputDf.groupBy(col("tableName"),col("runDate"))
.agg(sum(when(col("MinBusinessDate") < col("runDate") && col("MaxBusinessDate") > col("runDate"),
when(col("business_date") > col("runDate"), col("rowCount")))).alias("NewColumnName"))
Hi Issue got resolved,
i'm trying to perform Group by operation inside a Columns literal, group by itself will produce a new columns instead writing a query like i asked above we have to change our query accordingly as follow.
inputDf = inputDf.groupBy(col("tableName"),col("runDate"))
.agg(sum(when(col("MinBusinessDate") < col("runDate") && col("MaxBusinessDate") > col("runDate"),
when(col("business_date") > col("runDate"), col("rowCount")))).alias("NewColumnName"))
Terms & Conditions
Privacy Policy and Data Policy
Unsubscribe / Do Not Sell My Personal Information
Supported Browsers Policy
Apache Hadoop
and associated open source project names are trademarks of the
Apache Software Foundation.
For a complete list of trademarks,
click here.