相关文章推荐
逃课的紫菜汤  ·  Enter Parameter Value ...·  1 年前    · 
失望的日光灯  ·  Python ...·  1 年前    · 

I am using the below code to create a table from a dataframe in databricks and run into error.

df.write.saveAsTable("newtable")

This works fine the very first time but for re-usability if I were to rewrite like below the multiple variants throw the same error and this was all working as expected previously.

df.write.mode(SaveMode.Overwrite).saveAsTable("newtable")
df.write.mode("overwrite").saveAsTable("newtable")

I get the following error.

Error Message:

org.apache.spark.sql.AnalysisException: Can not create the managed table newtable. The associated location dbfs:/user/hive/warehouse/newtable already exists

Thank you in advance.

Hi @Raj D ,

Sorry you are experiencing this and thanks for reaching out in Microsoft Q&A forum.

This problem could be due to a change in the default behavior of Spark version 2.4 (In Databricks Runtime 5.0 and above).

This problem can occur if:

  • The cluster is terminated while a write operation is in progress.
  • A temporary network issue occurs.
  • The job is interrupted.
  • Once the metastore data for a particular table is corrupted, it is hard to recover except by dropping the files in that location manually. Basically, the problem is that a metadata directory called _STARTED isn’t deleted automatically when Azure Databricks tries to overwrite it.

    Recommended Solution:

    Please try setting the flag "spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation" to "true". This flag deletes the _STARTED directory and returns the process to the original state. For example, you can set it in the notebook as shown below:

    spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")  
    

    Or you can also try setting it at cluster level Spark configuration:

    spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation true  
    

    Another option is to manually clean up the data directory specified in the error message. You can do this with "dbutils.fs.rm".

    dbutils.fs.rm("<path-to-directory>", True)  
    

    Please refer to this documentation which address this issue: Create table in overwrite mode fails when interrupted

    Hope this info helps. Let us know how it goes.

    Thank you

    ----------

    Please do consider to click on "Accept Answer" and "Upvote" on the post that helps you, as it can be beneficial to other community members.

    Hi @Raj D ,

    Following up to see if the above information was helpful to resolve your issue? In case if you still need assistance, please do let us know.

    Thank you

    Hi @Raj D ,

    We still have not heard back from you. Just wanted to check if you are you still facing the issue? In case If you already found a solution, would you please share it here with the community? Otherwise, let us know and we will continue to engage with you on the issue. Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

    Hi KranthiPakala-MSFT, Thanks very much for your response. I was not able to reply. I found a work around. I am using the below command. For some reason the configuration setting works sometimes and doesn't other times.

    drop table if exists newtable
    						

    It happened to me when trying to overwrite a table with different data types, the table already existed but I was trying to overwrite it using a different table creation method. It seemed to me that the first method used to create the table was created with certain column data types, but then, when overwriting the table with the other method, it defined another column data types for the table. Short answer: validate column data types when overwriting.