相关文章推荐
含蓄的人字拖  ·  Python ...·  1 年前    · 
兴奋的沙滩裤  ·  C# ...·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I'm trying to add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame, using Scala APIs. Starting dataframe:

color
Green

Desired dataframe (SQL syntax: CASE WHEN color == Green THEN 1 ELSE 0 END AS bool):

color bool
Red   0
Green 1
Blue  0

How should I implement this logic?

In the upcoming SPARK 1.4.0 release (should be released in the next couple of days). You can use the when/otherwise syntax:

// Create the dataframe
val df = Seq("Red", "Green", "Blue").map(Tuple1.apply).toDF("color")
// Use when/otherwise syntax
val df1 = df.withColumn("Green_Ind", when($"color" === "Green", 1).otherwise(0))

If you are using SPARK 1.3.0 you can chose to use a UDF:

// Define the UDF
val isGreen = udf((color: String) => {
  if (color == "Green") 1
  else 0
val df2 = df.withColumn("Green_Ind", isGreen($"color"))

In Spark 1.5.0: you can also use the SQL syntax expr function

val df3 = df.withColumn("Green_Ind", expr("case when color = 'green' then 1 else 0 end"))

or plain spark-sql

df.registerTempTable("data")
val df4 = sql(""" select *, case when color = 'green' then 1 else 0 end as Green_ind from data """)
import sqlContext._
val rdd = sc.parallelize((1 to 100).map(i => Record(i, s"val_$i")))
rdd.registerTempTable("records")
println("Result of SELECT *:")
sql("SELECT case key when '93' then 'ravi' else key end FROM records").collect()

I was looking for that long time so here is example of SPARK 2.1 JAVA with group by- for other java users.

import static org.apache.spark.sql.functions.*;
 //...
    Column uniqTrue = col("uniq").equalTo(true);
    Column uniqFalse = col("uniq").equalTo(false);
    Column testModeFalse = col("testMode").equalTo(false);
    Column testModeTrue = col("testMode").equalTo(true);
    Dataset<Row> x = basicEventDataset
            .groupBy(col(group_field))
            .agg(
                    sum(when((testModeTrue).and(uniqTrue), 1).otherwise(0)).as("tt"),
                    sum(when((testModeFalse).and(uniqTrue), 1).otherwise(0)).as("ft"),
                    sum(when((testModeTrue).and(uniqFalse), 1).otherwise(0)).as("tf"),
                    sum(when((testModeFalse).and(uniqFalse), 1).otherwise(0)).as("ff")
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.