Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

and I want to find the average rating for each ID in spark. This is the code I have so far but it keeps on giving me an error:

val Avg_data=spark.sql("select ID, AVG(Rating) from table")
  

ERROR: org.apache.sapk.sql.AnalysisException: grouping expressions sequence is empty, and 'table'.'ID' is not an aggregate function. Wrap '(avg(CAST(table.'Rating' AS BIGINT)) as 'avg(Rating)')' in windowing function(s).........

AVG() is an aggregation function so you would need a group by too

val Avg_data=spark.sql("select ID, AVG(Rating) as average from table group by ID")

You should have Avg_data as

+---+-------+
|ID |average|
+---+-------+
|1  |3.5    |
|2  |4.0    |
+---+-------+
                If the answer helped you then you should consider accepting it and upvoting too :) thanks @Skyhopper9
– user9548623
                Mar 25, 2018 at 17:55

2.Registering df as temp table and writing query with GROUP BY and AVG()

df.registerTempTable("table")
val avg_data=spark.sql("select ID,avg(Rating) from table group by ID")
avg_data.show
+---+-----------+
| ID|avg(Rating)|
+---+-----------+
|  1|        3.5|
|  2|        4.0|
+---+-----------+
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.