Hadoop伪分布式部署之yarn和mapreduce
Hadoop伪分布式部署之yarn和mapreduce
mapreduce是hadoop的分布式计算框架,它依赖于hadoop的分布式文件系统hdfs。
mapreduce作为计算引擎,需要依赖于hadoop的分布式资源管理框架yarn,今天我们就来介绍一下yarn和mapreduce的伪分布式部署,相关环境如下:
操作系统:CentOS6.4
Java版本:Oracle jdk1.7
Hadoop版本:Hadoop2.5.0
主机hostname:hadoop01.datacenter.com
hadoop目录:/opt/modules/hadoop-2.5.0
1、Java环境设置
配置yarn的Java环境
[hadoop@hadoop01 ~]$ cd /opt/modules/hadoop-2.5.0/ [hadoop@hadoop01 hadoop-2.5.0]$ vim etc/hadoop/yarn-env.sh ... # some Java parameters export JAVA_HOME=/opt/modules/jdk1.7.0_67 ... [hadoop@hadoop01 hadoop-2.5.0]$
配置mapreduce的Java环境
[hadoop@hadoop01 hadoop-2.5.0]$ vim etc/hadoop/mapred-env.sh ... export JAVA_HOME=/opt/modules/jdk1.7.0_67 ... [hadoop@hadoop01 hadoop-2.5.0]$
2、yarn和mapreduce参数设置
配置resourcemanager的hostname和nodemanager的附属服务
[hadoop@hadoop01 hadoop-2.5.0]$ vim etc/hadoop/yarn-site.xml ...yarn.resourcemanager.hostname hadoop01.datacenter.com yarn.nodemanager.aux-services mapreduce_shuffle ... [hadoop@hadoop01 hadoop-2.5.0]$
配置mapreduce运行环境为yarn
[hadoop@hadoop01 hadoop-2.5.0]$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml [hadoop@hadoop01 hadoop-2.5.0]$ vim etc/hadoop/mapred-site.xml ...mapreduce.framework.name yarn ...
3、启动hdfs和yarn服务
启动hdfs和yarn的相关服务
[hadoop@hadoop01 hadoop-2.5.0]$ sbin/hadoop-daemon.sh start namenode starting namenode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-hadoop-namenode-hadoop01.datacenter.com.out [hadoop@hadoop01 hadoop-2.5.0]$ sbin/hadoop-daemon.sh start datanode starting datanode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-hadoop-datanode-hadoop01.datacenter.com.out [hadoop@hadoop01 hadoop-2.5.0]$ sbin/yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /opt/modules/hadoop-2.5.0/logs/yarn-hadoop-resourcemanager-hadoop01.datacenter.com.out [hadoop@hadoop01 hadoop-2.5.0]$ sbin/yarn-daemon.sh start nodemanager starting nodemanager, logging to /opt/modules/hadoop-2.5.0/logs/yarn-hadoop-nodemanager-hadoop01.datacenter.com.out [hadoop@hadoop01 hadoop-2.5.0]$
通过jps命令检查服务启动是否成功
[hadoop@hadoop01 hadoop-2.5.0]$ jps 3557 NodeManager 3133 NameNode 3192 DataNode 3313 ResourceManager 3670 Jps [hadoop@hadoop01 hadoop-2.5.0]$
通过8088端口访问yarn的webapp服务,我机器上的地址为http://hadoop01.datacenter.com:8088
4、wordcount测试
接下来我们使用官方的wordcount程序测试一下我们的环境。
首先我们在本地新建一个文本文档wordcount.txt,并输入一些单词
[hadoop@hadoop01 hadoop-2.5.0]$ vim /opt/datas/wordcount.txt hadoop hdfs yarn mapreduce hbase spark spark hello [hadoop@hadoop01 hadoop-2.5.0]$
将wordcount.txt文件上传到hdfs的相对路径mapreduce/wordcount/input下
[hadoop@hadoop01 hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p mapreduce/wordcount/input 18/04/07 22:10:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@hadoop01 hadoop-2.5.0]$ bin/hdfs dfs -put /opt/datas/wordcount.txt mapreduce/wordcount/input/ 18/04/07 22:10:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@hadoop01 hadoop-2.5.0]$
使用bin/yarn执行官方自带的wordcount程序,输入为hdfs上mapreduce/wordcount/input目录,输出为hdfs上mapreduce/wordcount/output目录
[hadoop@hadoop01 hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount mapreduce/wordcount/input mapreduce/wordcount/output ... [hadoop@hadoop01 hadoop-2.5.0]$
程序完成之后我们可以查看hdfs的输出目录的内容
[hadoop@hadoop01 hadoop-2.5.0]$ bin/hdfs dfs -ls mapreduce/wordcount/output 18/04/07 22:19:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2018-04-07 22:17 mapreduce/wordcount/output/_SUCCESS -rw-r--r-- 1 hadoop supergroup 59 2018-04-07 22:17 mapreduce/wordcount/output/part-r-00000 [hadoop@hadoop01 hadoop-2.5.0]$ bin/hdfs dfs -cat mapreduce/wordcount/output/part* 18/04/07 22:20:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable hadoop 1 hbase 1 hdfs 1 hello 1 mapreduce 1 spark 2 yarn 1 [hadoop@hadoop01 hadoop-2.5.0]$
从上面的结果可以看出,输出文件已经对输入文件中的单词做了统计,并且是按照单词的字母顺序做的排序。
5、关闭hdfs和yarn服务
最后我们关闭hdfs和yarn服务
[hadoop@hadoop01 hadoop-2.5.0]$ sbin/hadoop-daemon.sh stop namenode stopping namenode [hadoop@hadoop01 hadoop-2.5.0]$ sbin/hadoop-daemon.sh stop datanode stopping datanode [hadoop@hadoop01 hadoop-2.5.0]$ sbin/yarn-daemon.sh stop resourcemanager stopping resourcemanager [hadoop@hadoop01 hadoop-2.5.0]$ sbin/yarn-daemon.sh stop nodemanager stopping nodemanager [hadoop@hadoop01 hadoop-2.5.0]$
使用jps命令查看服务关闭是否成功
[hadoop@hadoop01 hadoop-2.5.0]$ jps 4738 Jps [hadoop@hadoop01 hadoop-2.5.0]$
总结
1、我们的mapreduce跑在yarn上面,所以和yarn一起部署
2、首先配置mapreduce和yarn的java环境
3、接下来我们配置yarn的其他参数,这里我们分别平配置了resourcemanager的hostname和nodemanager的附属服务
4、最后我们配置了mapreduce运行在yarn上
注意事项:mapreduce的输出目录在程序开始之前必须不存在!