Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I have a streaming Dataset with columns: bag_id, ball_color. I want to find the most popular color for each bag. So, I tried:

dataset.groupBy("bag_id", "color") # 1st aggregation
       .agg(count("color").as("color_count"))
       .groupBy("bag_id") # 2nd aggregation
       .agg(max("color_count"))

But I had an error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Multiple streaming aggregations are not supported with streaming DataFrames/Datasets;;

Can I create right query with only one aggregation function?

@ggeop Thank you for helping. I resolved my issue with only one aggregation with function sum – Maksym Ivanov Feb 2, 2020 at 13:48 perfect, nice news :-) I had the same issue, but I didn't have this flexibility to use one aggregation, so I used foreachBatch() method. – ggeop Feb 2, 2020 at 16:32

There is an open Jira addressing this issue Spark-26655, as of now we can't run multiple aggregations on the Streaming data.

One workaround would be Performing one aggregation and saving back to Kafka..etc and again read from kafka to perform another aggregation.

We can run only one aggregation on the streaming data and saving it to HDFS/Hive/HBase and fetch to perform additional aggregations(this would be seperate job)

Yes, in Spark 2.4.4 (latest for now) is NOT support yet Multiple streaming aggregations. But, as a workaround you can use the .foreachBatch() method:

def foreach_batch_function(df, epoch_id):
  df.groupBy("bag_id","color")
  .agg(count("color").as("color_count"))
  .groupBy("bag_id").agg(max("color_count"))
  .show() # .show() is a dummy action
streamingDF.writeStream.foreachBatch(foreach_batch_function).start()  

In .foreachBatch() the df is not a streaming df, so you can do everything you want.

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.