欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

MapReduce入门

程序员文章站 2024-03-08 09:10:10
...

MapReduce入门

MapReduce原理图

  • MapReduce入门

例子:单词计数图解

  • MapReduce入门

单词计数idea项目【maven项目】

  • pom文件

     

    <repositories>
    
        <repository>
            <id>repo</id>
            <url>http://repo1.maven.org/maven2/</url>
        </repository>
    
        <!--下载hadoop-clinet2.6.0-cdh5.7.0所需的仓库地址,记得在setting.xml文件中添加对cloudera的通过【<mirrorOf>*,!cloudera</mirrorOf>】-->
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/content/repositories/releases/</url>
        </repository>
    
    </repositories>
    
    
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.0-cdh5.7.0</version>
        </dependency>
    
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.10</version>
            <scope>test</scope>
        </dependency>
    
        <!--这里是因为jetty-util6.1.26.cloudera.d在cloudera仓库中找不到,将hadoop2.6.0-cdh5.7.0下的jetty-util的配置手动注释掉然后下载这个【这个注释给碰到这个问题的有缘人看的】-->
        <dependency>
            <groupId>org.mortbay.jetty</groupId>
            <artifactId>jetty</artifactId>
            <version>6.1.26</version>
        </dependency>
    
    </dependencies>
    
  • WordCountTest类

     

    package com.peng.mapreducetest;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    import java.io.IOException;
    
    public class WordCountTest {
        //main
        public static void main(String[] args) throws Exception {
            //创建配置文件
            Configuration configuration = new Configuration();
            //创建job对象
            Job job = Job.getInstance(configuration, "wordcount");
            //设置job的处理类
            job.setJarByClass(WordCountTest.class);
            //设置作业处理的输入路径
            FileInputFormat.setInputPaths(job, new Path(args[0]));
    
            //设置map相关参数
            job.setMapperClass(MyMapper.class);
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(LongWritable.class);
    
            //设置reduce相关参数
            job.setReducerClass(MyReduce.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(LongWritable.class);
    
            //设置作业处理的输出路径
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
            //退出
            System.exit(job.waitForCompletion(true) ? 0 : 1);
    
        }
    
        /**
         * 读取输入文件
         */
        public static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
                //接收到的每一行数据
                String line = value.toString();
                //按照指定分隔符进行分割字符串
                String[] words = line.split(" ");
                //指定字词出现的次数
                LongWritable one = new LongWritable(1);
                //循环单词数组,将单词进行记录和存放
                for (String word : words) {
                    //通过上下文将map的处理的结果进行输出
                    context.write(new Text(word), one);
                }
    
            }
        }
    
        /**
         * 归并操作
         */
        public static class MyReduce extends Reducer<Text, LongWritable, Text, LongWritable> {
            @Override
            protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
                //定义单词出现的总数
                long sum = 0;
                for (LongWritable value : values) {
                    //求key出现的次数
                    sum += value.get();
                }
                //将统计的结果进行输出
                context.write(key, new LongWritable(sum));
            }
        }
    }
    
  • 将项目打成jar包
    • MapReduce入门
    • MapReduce入门
  • 将打包好的jar包拷到放到装有hadoop服务的虚拟机下【位置随意】
    • MapReduce入门
  • 创建words.txt文件,并上传到hdfs的/下【文件内容中单词与单词之间用英文空格隔开】
    • MapReduce入门
    • MapReduce入门
  • 执行单词计数程序
    • hadoop jar hdfstest-1.0-SNAPSHOT.jar com.peng.mapreducetest.WordCountTest hdfs://hadoop01:8020/words.txt hdfs://hadoop01/result
  • 执行结果
    • hdfs命令查看是否生成了对应的执行结果文件
      • MapReduce入门
    • 浏览器查看
      • MapReduce入门
  • 查看结果文件【单词计数的结果】
    • MapReduce入门