执行指令:
hadoop jar $HADOOP_INSTALL_HOME/contrib/streaming/hadoop-*streaming*.jar \
-file ./mapper.py -mapper ./mapper.py \
-file ./reducer.py -reducer ./reducer.py \
-input /data/poem/data_test \
-output /data/poem/result
报错信息:
packageJobJar: [mapper.py, reducer.py] [] /tmp/streamjob4957099323859594325.jar tmpDir=null
17/04/13 15:10:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/13 15:10:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/13 15:10:56 INFO mapred.FileInputFormat: Total input paths to process : 2
17/04/13 15:10:56 INFO mapreduce.JobSubmitter: number of splits:2
17/04/13 15:10:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492067422224_0001
17/04/13 15:10:57 INFO impl.YarnClientImpl: Submitted application application_1492067422224_0001
17/04/13 15:10:57 INFO mapreduce.Job: The url to track the job: http://chinahaoop0:8088/proxy/application_1492067422224_0001/
17/04/13 15:10:57 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local]
17/04/13 15:10:57 INFO streaming.StreamJob: Running job: job_1492067422224_0001
17/04/13 15:10:57 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/13 15:10:59 INFO streaming.StreamJob: map 0% reduce 0%
17/04/13 15:11:56 INFO streaming.StreamJob: map 50% reduce 0%
17/04/13 15:11:57 INFO streaming.StreamJob: map 100% reduce 0%
17/04/13 15:11:58 INFO streaming.StreamJob: map 0% reduce 0%
17/04/13 15:12:27 INFO streaming.StreamJob: map 50% reduce 0%
17/04/13 15:12:31 INFO streaming.StreamJob: map 0% reduce 0%
17/04/13 15:13:08 INFO streaming.StreamJob: map 100% reduce 0%
17/04/13 15:13:09 INFO streaming.StreamJob: map 0% reduce 0%
17/04/13 15:13:30 INFO streaming.StreamJob: map 50% reduce 0%
17/04/13 15:13:32 INFO streaming.StreamJob: map 100% reduce 0%
17/04/13 15:13:33 INFO streaming.StreamJob: map 100% reduce 100%
17/04/13 15:13:36 INFO streaming.StreamJob: Job running in-process (local Hadoop)
17/04/13 15:13:36 ERROR streaming.StreamJob: Job not Successful!
17/04/13 15:13:36 INFO streaming.StreamJob: killJob...
17/04/13 15:13:36 INFO impl.YarnClientImpl: Killed application application_1492067422224_0001
Streaming Job Failed!
找到日志文件,发现具体报错信息为:
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1937)
at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:1125)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1905)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
... 8 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1811)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
... 9 more
报错的关键信息是:
java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
定位错误过程
1.MR脚本有误:
本地测试的时候,脚本正常,排除此问题。
2.环境配置有误:
使用hadoop的example jar包测试,正常。排除此问题。
3.jar包问题:
因为提示ClassNotFund的异常,第一个时间就应该想到是jar包的问题。jar包可能与hadoop的版本不匹配。
最终处理:
我的jar包是在网上单独下的,因为根据网上大多数教程提供的路径$HADOOP_INSTALL_HOME/contrib/streaming/hadoop-streaming.jar
最初我没有找到相应的路径,以为需要自身去下载。
最后发现,hadoop 2.5.2中对应的jar包地址是在:
$HADOOP_INSTALL_HOME/share/hadoop/tools/lib
藏得有点儿太深了呀(′д` )…彡…彡!找了我半天!
重写的执行语句:
hadoop jar $HADOOP_INSTALL_HOME/share/hadoop/tools/lib/hadoop-*streaming*.jar\
-file ./mapper.py -mapper ./mapper.py \
-file ./reducer.py -reducer ./reducer.py \
-input /data/poem/data_test -output /data/poem/result
经验总结:
- ClassNotFound异常,很有可能是jar包与hadoop环境不匹配。我的jar包太老了。像hadoop-streaming*.jar这类型的官方发布基础jar包,一般在装软件的时候都会自带。
- 软件不同的版本,其路径很有可能有变化,需要灵活应变。(就连Centos7对比之前版本,许多命令都变了呢)
- 屏幕上打印的的异常信息,常常不是很详细且精准。除了看屏幕上的错误信息以外,最好查看运行日志,查看详细的错误报告。
Efflux是用于MapReduce和Hadoop Streaming的一组Rust接口。 它使Rust开发人员能够在Hadoop基础架构上运行批处理作业,同时保持其惯常的效率和安全性。
最初是为了消除个人痒而编写的,此板条箱提供了一些简单的特征来掩盖使用Hadoop Streaming的内部原理,从而使他们能够快速地编写工作。 在可能的情况下,将功能移交给宏,以提供编译时保证,并尽可能简化其他任何功能,以避免开销。
在上作为库条板箱使用,因此您只需将其添加为依赖项即可:
[ dependencies ]
efflux = " 2.0 "
然后,您可以使用Efflux的prelude模块访问所有相关内容:
use efflux :: prelude :: * ;
Efflux随附了一个方便的模板,可使用工具帮助生成新项目。 您可以简单地使用下面的命令,并按照提
用hadoop streaaming写了个小的mapreduce实验程序,脚本如下:
#! /bash/sh
hadoop jar /usr/share/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar \
-input intest \
-output outtest \
-mapper cat \
-reducer cat\
I am trying to run python script on Hadoop cluster using Hadoop Streaming for sentiment analysis.The Same script I am running on Local machine which is running Properly and giving output.to run on loc...
1.datanode无法启动
这个要从执行Hadoop namenode -format说起,在第一次启动Hadoop之前需要进行初始化,执行该语句后会在master的/usr/local/src/hadoop-2.6.1/dfs/name目录(该目录配置在hdfs-site...
hadoop jar $streaming_jar \
-D mapreduce.job.queuename=root.default \
-input $input,$input2
1.为何 hadoop streaming 命令 。bin/hadoop jar hadoop-streaming-1.1.2.jar -input sjtu -output shuchu -mapper cat -reducer wc 可以成功运行
bin/hadoop jar hadoop-streaming-1.1.2.jar -input sjtu -outp
Hadoop报错信息如下:
120 15/01/27 15:48:23 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 4. LastFailedTask: task_20150115
0420_80244_m
[hadoop@master test]$ hadoop jar /home/hadoop/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar -info
Warning: $HADOOP_HOME is deprecated.
14/12/15 14:06:32 ERROR streaming.StreamJob: Mi
1) 子进程返回错误1
java.lang.RuntimeException: PipeMapRed?.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed?.java:320)
#!/bin/bash -
#===============================================================================
# USAGE: ./monitor_analyt...
1、streaming默认的情况下,mapper和reducer的返回值不是0,被认为异常任务,将被再次执行,默认尝试4次都不是0,整个job都将失败java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed wi
th code 1
at org.apache.hadoop.stream
尽管Hadoop的基本框架是用java实现的,但hadoop程序不限于java,可以用python、C++及ruby等等。
本例实现统计输入文本的单词的频数。
操作系统:CentOS7.6
Hadoop版本: Hadoop 3.2.0伪分布式环境
Python版本: Python2.7.5
reduce
用python写MapReduce函数——以...