相关文章推荐
愤怒的豆芽  ·  Apache Kyuubi 在 T3 ...·  1 月前    · 
豪爽的热水瓶  ·  How to export a Hive ...·  2 周前    · 
帅气的松球  ·  c++ ...·  6 月前    · 
成熟的伏特加  ·  read zip file from ...·  2 年前    · 
会开车的西装  ·  opencv4.3.0+Visual ...·  2 年前    · 
帅呆的豌豆  ·  MRTK 2.5 发行说明 - MRTK ...·  2 年前    · 
害羞的匕首  ·  SqlParameter Class | ...·  2 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to copy data of a partitioned Hive table from one cluster to another. I am using distcp to copy the data but the data underlying data is of a partitioned hive table. I used the following command.

hadoop distcp -i {src} {tgt}

But as the table was partitioned the directory structure was created according to the partitioned tables. So it is showing error creating duplicates and aborting job.

org.apache.hadoop.toolsCopyListing$DulicateFileException: File would cause duplicates. Aborting

I also used -skipcrccheck -update -overwrite but none worked.

How to copy the data of a table from partitioned file path to destination?

Check the below settings to see if they are false.Set them to true.

hive> set hive.mapred.supports.subdirectories;
hive.mapred.supports.subdirectories=false
hive> set mapreduce.input.fileinputformat.input.dir.recursive;
mapreduce.input.fileinputformat.input.dir.recursive=false

hadoop distcp -Dmapreduce.map.memory.mb=20480 -Dmapreduce.map.java.opts=-Xmx15360m -Dipc.client.fallback-to-simple-auth-allowed=true -Ddfs.checksum.type=CRC32C -m 500 \ -pb -update -delete {src} {target}

Ideally there can't be same file names. So, what's happening in your case is you trying to copy partitioned table from one cluster to other. And, 2 different named partitions have same file name.

Your solution is to correct Source path {src} in your command, such that you provide path uptil partitioned sub directory not the file.

For ex - Refer below :

/a/partcol=1/file1.txt
/a/partcol=2/file1.txt

If you use {src} as "/a/*/*" then you will get the error "File would cause duplicates."

But, if you use {src} as "/a" then you will not get error in copying.

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.