Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams
exitTotalDF
  .filter($"accid" === "dc215673-ef22-4d59-0998-455b82000015")
  .groupBy("exiturl")
  .agg(first("accid"), first("segment"), $"exiturl", sum("session"), sum("sessionfirst"), first("date"))
  .orderBy(desc("session"))
  .take(500)
org.apache.spark.sql.AnalysisException: cannot resolve '`session`' given input columns: [first(accid, false), first(date, false),  sum(session), exiturl, sum(sessionfirst), first(segment, false)]

Its like the sum function cannot find the column names properly.

Using Spark 2.1

Typically in scenarios like this, I'll use the as method on the column. For example .agg(first("accid"), first("segment"), $"exiturl", sum("session").as("session"), sum("sessionfirst"), first("date")). This gives you more control on what to expect, and if the summation name were to ever change in future versions of spark, you will have less of a headache updating all of the names in your dataset.

Also, I just ran a simple test. When you don't specify the name, it looks like the name in Spark 2.1 gets changed to "sum(session)". One way to find this yourself is to call printSchema on the dataset.

I prefer using withColumnRenamed() instead of as() because:

With as() one has to list all the columns he needs like this:

    df.select(first("accid"), 
          first("segment"),
          $"exiturl", 
          col('sum("session")').as("session"),
          sum("sessionfirst"),
          first("date"))

VS withColumnRenamed is one liner:

    df1 = df.withColumnRenamed('sum("session")', "session")

Output df1 will have all the columns that df has except that sum("session") column is now renamed to "session"

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.