执行指令:

hadoop jar $HADOOP_INSTALL_HOME/contrib/streaming/hadoop-*streaming*.jar   \
-file ./mapper.py -mapper ./mapper.py \
-file ./reducer.py -reducer ./reducer.py \
-input /data/poem/data_test \
-output /data/poem/result

报错信息:

packageJobJar: [mapper.py, reducer.py] [] /tmp/streamjob4957099323859594325.jar tmpDir=null
17/04/13 15:10:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/13 15:10:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/13 15:10:56 INFO mapred.FileInputFormat: Total input paths to process : 2
17/04/13 15:10:56 INFO mapreduce.JobSubmitter: number of splits:2
17/04/13 15:10:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492067422224_0001
17/04/13 15:10:57 INFO impl.YarnClientImpl: Submitted application application_1492067422224_0001
17/04/13 15:10:57 INFO mapreduce.Job: The url to track the job: http://chinahaoop0:8088/proxy/application_1492067422224_0001/
17/04/13 15:10:57 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local]
17/04/13 15:10:57 INFO streaming.StreamJob: Running job: job_1492067422224_0001
17/04/13 15:10:57 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/13 15:10:59 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/13 15:11:56 INFO streaming.StreamJob:  map 50%  reduce 0%
17/04/13 15:11:57 INFO streaming.StreamJob:  map 100%  reduce 0%
17/04/13 15:11:58 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/13 15:12:27 INFO streaming.StreamJob:  map 50%  reduce 0%
17/04/13 15:12:31 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/13 15:13:08 INFO streaming.StreamJob:  map 100%  reduce 0%
17/04/13 15:13:09 INFO streaming.StreamJob:  map 0%  reduce 0%
17/04/13 15:13:30 INFO streaming.StreamJob:  map 50%  reduce 0%
17/04/13 15:13:32 INFO streaming.StreamJob:  map 100%  reduce 0%
17/04/13 15:13:33 INFO streaming.StreamJob:  map 100%  reduce 100%
17/04/13 15:13:36 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/13 15:13:36 ERROR streaming.StreamJob: Job not Successful!
17/04/13 15:13:36 INFO streaming.StreamJob: killJob...
17/04/13 15:13:36 INFO impl.YarnClientImpl: Killed application application_1492067422224_0001
Streaming Job Failed!

找到日志文件,发现具体报错信息为:

Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1937)
        at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:1125)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1905)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
        ... 8 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1811)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
        ... 9 more

报错的关键信息是:

java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found

定位错误过程

1.MR脚本有误:

本地测试的时候,脚本正常,排除此问题。

2.环境配置有误:

使用hadoop的example jar包测试,正常。排除此问题。

3.jar包问题:

因为提示ClassNotFund的异常,第一个时间就应该想到是jar包的问题。jar包可能与hadoop的版本不匹配。

最终处理:

我的jar包是在网上单独下的,因为根据网上大多数教程提供的路径$HADOOP_INSTALL_HOME/contrib/streaming/hadoop-streaming.jar
最初我没有找到相应的路径,以为需要自身去下载。

最后发现,hadoop 2.5.2中对应的jar包地址是在:

$HADOOP_INSTALL_HOME/share/hadoop/tools/lib

藏得有点儿太深了呀(′д` )…彡…彡!找了我半天!

重写的执行语句:

hadoop jar $HADOOP_INSTALL_HOME/share/hadoop/tools/lib/hadoop-*streaming*.jar\
-file ./mapper.py -mapper ./mapper.py \
-file ./reducer.py  -reducer ./reducer.py \
-input /data/poem/data_test -output /data/poem/result

经验总结:

  1. ClassNotFound异常,很有可能是jar包与hadoop环境不匹配。我的jar包太老了。像hadoop-streaming*.jar这类型的官方发布基础jar包,一般在装软件的时候都会自带。
  2. 软件不同的版本,其路径很有可能有变化,需要灵活应变。(就连Centos7对比之前版本,许多命令都变了呢)
  3. 屏幕上打印的的异常信息,常常不是很详细且精准。除了看屏幕上的错误信息以外,最好查看运行日志,查看详细的错误报告。
Efflux是用于MapReduceHadoop Streaming的一组Rust接口。 它使Rust开发人员能够在Hadoop基础架构上运行批处理作业,同时保持其惯常的效率和安全性。 最初是为了消除个人痒而编写的,此板条箱提供了一些简单的特征来掩盖使用Hadoop Streaming的内部原理,从而使他们能够快速地编写工作。 在可能的情况下,将功能移交给宏,以提供编译时保证,并尽可能简化其他任何功能,以避免开销。 在上作为库条板箱使用,因此您只需将其添加为依赖项即可: [ dependencies ] efflux = " 2.0 " 然后,您可以使用Efflux的prelude模块访问所有相关内容: use efflux :: prelude :: * ; Efflux随附了一个方便的模板,可使用工具帮助生成新项目。 您可以简单地使用下面的命令,并按照提 用hadoop streaaming写了个小的mapreduce实验程序,脚本如下: #! /bash/sh hadoop jar /usr/share/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar \ -input intest \ -output outtest \ -mapper cat \ -reducer cat\ I am trying to run python script on Hadoop cluster using Hadoop Streaming for sentiment analysis.The Same script I am running on Local machine which is running Properly and giving output.to run on loc... 1.datanode无法启动 这个要从执行Hadoop namenode -format说起,在第一次启动Hadoop之前需要进行初始化,执行该语句后会在master的/usr/local/src/hadoop-2.6.1/dfs/name目录(该目录配置在hdfs-site... hadoop jar $streaming_jar \ -D mapreduce.job.queuename=root.default \      -input $input,$input2 1.为何 hadoop streaming 命令  。bin/hadoop jar hadoop-streaming-1.1.2.jar  -input  sjtu   -output  shuchu  -mapper cat  -reducer wc   可以成功运行 bin/hadoop jar hadoop-streaming-1.1.2.jar  -input  sjtu   -outp Hadoop报错信息如下: 120 15/01/27 15:48:23 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 4. LastFailedTask: task_20150115     0420_80244_m [hadoop@master test]$ hadoop jar /home/hadoop/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar -info Warning: $HADOOP_HOME is deprecated. 14/12/15 14:06:32 ERROR streaming.StreamJob: Mi 1) 子进程返回错误1 java.lang.RuntimeException: PipeMapRed?.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed?.java:320) #!/bin/bash - #=============================================================================== # USAGE: ./monitor_analyt... 1、streaming默认的情况下,mapper和reducer的返回值不是0,被认为异常任务,将被再次执行,默认尝试4次都不是0,整个job都将失败java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed wi th code 1 at org.apache.hadoop.stream 尽管Hadoop的基本框架是用java实现的,但hadoop程序不限于java,可以用python、C++及ruby等等。 本例实现统计输入文本的单词的频数。 操作系统:CentOS7.6 Hadoop版本: Hadoop 3.2.0伪分布式环境 Python版本: Python2.7.5 reduce 用pythonMapReduce函数——以...