Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I'm using scala as programming language in my azure databricks notebook, where my dataframe giving me accurate result, but when I'm trying to store the same in csv it shifting the cell where comma(,) is coming
spark.sql("""
SELECT * FROM invalidData
""").coalesce(1)
.write
.option("header", "true")
.format("com.databricks.spark.csv")
.mode("overwrite")
.save(s"$dbfsMountPoint/invalid/${fileName.replace(".xlsx", ".csv")}")
Here one column having data like 256GB SSD, Keyb.:, so while writing it using above function it show string after comma(,) in another cell.
Any spark inbuilt solution appriciated...
–
As @Jasper-M pointed out you can write the output csv with a custom separator.
In this example we use |
as the separator:
spark.sql("""
SELECT * FROM invalidData
""").coalesce(1)
.write
.option("header", "true")
.format("com.databricks.spark.csv")
.option("sep", "|")
.mode("overwrite")
.save(s"$dbfsMountPoint/invalid/${fileName.replace(".xlsx", ".csv")}")
It is worth noting that the save
method takes in a path to save to and not the filename itself. A .csv file (1 file since you set
.coalesce(1)`) will be saved under this path, treating this input as a directory.
To read the .csv
back in, using spark:
spark.read.format("com.databricks.spark.csv")
.option("inferSchema", "true")
.option("sep","|")
.option("header", "true")
.load(s"$dbfsMountPoint/invalid/${path}")
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.