相关文章推荐
豪气的感冒药  ·  js ...·  1 年前    · 
性感的斑马  ·  报表开发工具Stimulsoft ...·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to learn spark and scala, on my trying to write the dataframe object of my result to parquet file by calling the parquet method, i am getting error as such

Code Base that fails:-

df2.write.mode(SaveMode.Overwrite).parquet(outputPath)

This fails too

df2.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").mode(SaveMode.Overwrite).parquet(outputPath)

Error Log:-

Exception in thread "main" org.apache.spark.sql.AnalysisException: Multiple sources found for parquet (org.apache.spark.sql.execution.datasources.v2.parquet.ParquetDataSourceV2, org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat), please specify the fully qualified class name.;
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:707)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:733)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:967)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:304)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:848)

How ever if I called another method for the save, the code works properly,

This works fine:-

df2.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").mode(SaveMode.Overwrite).save(outputPath)

Although I have a solution for the issue, i'd like to understand why the first approach is not working and how I can solve it.

The details of the specification i am using are:- Scala 2.12.9 Java 1.8 Spark 2.4.4

P.S. This issue is only seen on spark-submit

Hi As per log there are two version of parquet in you environment and run time is not able to decide which one is correct . – sandeep rawat Sep 21, 2020 at 5:54

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.