书签分享收藏举报版权申诉 / 7

立即下载加入VIP,免费下载

当前位置：首页 > 党团工作 > 入党转正申请 > Hadoop新旧API对比初探+链式mapreduce.docx

Hadoop新旧API对比初探+链式mapreduce.docx

文档编号：3538323
上传时间：2022-11-23
格式：DOCX
页数：7
大小：17.79KB

《Hadoop新旧API对比初探+链式mapreduce.docx》由会员分享，可在线阅读，更多相关《Hadoop新旧API对比初探+链式mapreduce.docx（7页珍藏版）》请在冰豆网上搜索。

Hadoop新旧API对比初探+链式mapreduce.docx

Hadoop新旧API对比初探+链式mapreduce

2013-11-1617:

04:

34| 分类：

海量数据|举报|字号订阅

1.背景

做毕设的时候碰到一个需要用链式mapreduce的问题，然后调研了一下ChainMapper和ChainReducer的使用，顺带缕一下新旧API的异同。

首先需要说明的是：

从0.20.0开始，hadoop的API发生了改变，但是旧API依然保留，包名：

org.apache.hadoop.mapred

新版API包名：

org.apache.hadoop.mapreduce

所有hadoop已发布版本的具体文档可以查看这里。

2.新旧版API的异同

以map/reduce定义为例

2.1map

类名定义

//新版

publicstaticclassMyMapperextendsMapper{}

//旧版

publicstaticclassMyMapperextendsMapReduceBaseimplementsMapper{}

涉及包名

//新版

importorg.apache.hadoop.mapreduce.Mapper

//旧版

importorg.apache.hadoop.mapred.MapReduceBase

importorg.apache.hadoop.mapred.Mapper

方法定义

新版

publicvoidmap（Objectkey,Textvalue,Contextcontext）throwsIOException,InterruptedException{}

旧版

publicvoidmap（Objectkey,Textvalue,OutputCollectoroutput,Reporterreporter）throwsIOException{}

涉及包名

//新版

importjava.io.IOException

//旧版

importjava.io.IOException

importorg.apache.hadoop.mapred.OutputCollector

importorg.apache.hadoop.mapred.Reporter

2.2reduce

类名定义

//新版

publicstaticMyReducerextendsReducer{}

//旧版

publicstaticclassMyReducerextendsMapReduceBaseimplementsReducer{}

涉及包名

//新版

importorg.apache.hadoop.mapreduce.Reducer

//旧版

importorg.apache.hadoop.mapred.Reducer

方法定义

//新版

publicvoidreduce（Textkey,Iterablevalues,Contextcontext）throwsIOException,InterruptedException{}

//旧版

publicvoidreduce（Textkey,Iteratorvalues,OutputCollectoroutput,Reporterreporter）throwsIOException{}

涉及包名

//新版

与map相同

//旧版

比map多一个包

importjava.util.Iterator;

2.3main方法

新版

Configurationconf=newConfiguration（）;

Jobjob=newJob（conf,"mapreduce"）;

job.setJarByClass（MyMapReduce.class）;

job.setMapperClass（MyMapper.class）;

job.setCombinerClass（MyReducer.class）;

job.setReducerClass（MyReducer.class）;

job.setOutputKeyClass（Text.class）;

job.setOutputValueClass（Text.class）;

FileInputFormat.addInputPath（job,newPath（path1））;

FileOutputFormat.setOutputPath（job,newPath（path2））;

job.waitForCompletion（true）;

旧版

JobConfconf=newJobConf（MyMapReduce.class）;

conf.setJobName（"mapreduce"）;

conf.setOutputKeyClass（Text.class）;

conf.setOutputValueClass（Text.class）;

conf.setMapperClass（MyMapper.class）;

conf.setCombinerClass（MyReducer.class）;

conf.setReducerClass（MyReducer.class）;

conf.setInputFormat（TextInputFormat.class）;

conf.setOutputFormat（TextOutputFormat.class）;

FileInputFormat.setInputPaths（conf,newPath（path1））;

FileOutputFormat.setOutputPath（conf,newPath（path2））;

JobClient.runJob（conf）;

即新版API中用Job代替了旧版中的JobConf，同时对一些方法也进行了重新定义，具体看下面的包对比

涉及包名

新版

importorg.apache.hadoop.fs.Path;

importorg.apache.hadoop.mapreduce.Job;

importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;

importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

旧版

importorg.apache.hadoop.fs.Path;

importorg.apache.hadoop.mapred.JobConf;

importorg.apache.hadoop.mapred.JobClient;

importorg.apache.hadoop.mapred.FileInputFormat;

importorg.apache.hadoop.mapred.FileOutputFormat;

importorg.apache.hadoop.mapred.TextInputFormat;

importorg.apache.hadoop.mapred.TextOutputFormat;

3.链式MapReduce

ChainMapper和ChainReducer在旧版包中，链式mapreduce只允许有一个reduce，但是可以有多个map，包括reduce之前和之后，我在实现时链如下：

map->reduce->map

packageorg.apache.hadoop.examples;

importjava.io.IOException;

importjava.util.StringTokenizer;

importjava.util.Iterator;

importorg.apache.hadoop.conf.Configuration;

importorg.apache.hadoop.fs.Path;

importorg.apache.hadoop.io.IntWritable;

importorg.apache.hadoop.io.Text;

importorg.apache.hadoop.mapred.JobConf;

importorg.apache.hadoop.mapred.JobClient;

importorg.apache.hadoop.mapred.FileInputFormat;

importorg.apache.hadoop.mapred.FileOutputFormat;

importorg.apache.hadoop.mapred.MapReduceBase;

importorg.apache.hadoop.mapred.lib.ChainMapper;

importorg.apache.hadoop.mapred.lib.ChainReducer;

importorg.apache.hadoop.mapred.Mapper;

importorg.apache.hadoop.mapred.Reducer;

importorg.apache.hadoop.mapred.OutputCollector;

importorg.apache.hadoop.mapred.Reporter;

importorg.apache.hadoop.mapred.TextInputFormat;

importorg.apache.hadoop.mapred.TextOutputFormat;

publicclassGraphPartition{

publicstaticclassLabelCompareMapper

extendsMapReduceBaseimplementsMapper{

publicvoidmap（Objectkey,Textvalue,OutputCollectoroutput,Reporterreporter）

throwsIOException{

Stringline=value.toString（）;

if（line.substring（0,1）.matches（"[0-9]{1}"））

{

String[]values=line.split（"\t"）;

String[]heads=values[0].split（""）;

if（values[1].contains（"_"））

{

values[1]=values[1].replace（"_",""）;

String[]tails=values[1].split（""）;

Stringsrc_ver=heads[0];

Stringlabel=heads[1];

for（inti=0;i

{

output.collect（newText（tails[i]）,newText（src_ver+"_"+label））;//<21_1>

}

output.collect（newText（heads[0]）,newText（heads[1]））;//<22>

}

publicstaticclassLabelCompareReducer

extendsMapReduceBaseimplementsReducer{

privateTextresult=newText（）;

publicvoidreduce（Textkey,Iteratorvalues,OutputCollectoroutput,Reporterreporter）

throwsIOException{

Stringhead="";

Stringtail="";

while（values.hasNext（））

{

Stringval=values.next（）.toString（）;

if（val.contains（"_"））

{

tail=tail+val+"";

}

else

{

head=val;

}

if（tail.contains（"_"））

{

result.set（head+""+tail）;

output.collect（key,result）;//<221_1>

}

publicstaticclassGraphPartitionMapper

extendsMapReduceBaseimplementsMapper{

publicvoidmap（Textkey,Textvalue,OutputCollectoroutput,Reporterreporter）

throwsIOException{

Stringver=key.toString（）;

Stringtail=value.toString（）;

String[]sps=tail.split（""）;

Stringver_lab=sps[0];

for（inti=0;i

{

if（sps[i].length（）>=1）

{

if（sps[i].contains（"_"））

{

String[]blocks=sps[i].split（"_"）;

if（blocks[1]==ver_lab）

{

output.collect（newText（blocks[0]+""+ver+""+blocks[1]）,newText（））;

}

else

{

output.collect（newText（blocks[0]+""+ver+""+ver_lab）,newText（））;

}

else

{

ver_lab=sps[i];

}

publicstaticvoidmain（String[]args）throwsException{

Stringpath1="lbp/input";

Stringpath2="lbp/out1";

JobConfjob=newJobConf（GraphPartition.class）;

job.setJobName（"ChainJob"）;

job.setInputFormat（TextInputFormat.class）;

job.setOutputFormat（TextOutputFormat.class）;

JobConflabelcomparemapperconf=newJobConf（false）;

ChainMapper.addMapper（job,LabelCompareMapper.class,Object.class,Text.class,Text.class,Text.class,true,labelcomparemapperconf）;

JobConflabelcomparereducerconf=newJobConf（false）;

ChainReducer.setReducer（job,LabelCompareReducer.class,Text.class,Text.class,Text.class,Text.class,true,labelcomparereducerconf）;

JobConfgraphpartitionmapperconf=newJobConf（false）;

ChainReducer.addMapper（job,GraphPartitionMapper.class,Text.class,Text.class,Text.class,Text.class,true,graphpartitionmapperconf）;

job.setJarByClass（GraphPartition.class）;

job.setNumReduceTasks

（1）;

job.setOutputKeyClass（Text.class）;

job.setOutputValueClass（Text.class）;

FileInputFormat.setInputPaths（job,newPath（path1））;

FileOutputFormat.setOutputPath（job,newPath（path2））;

JobClient.runJob（job）;

}

上面代码已经经过测试，我一直相信实践是检验真理的唯一标准，这么说是因为网上有好几篇介绍链式mapreduce的博客都没有说明hadoop版本，而且代码漏洞百出，所以觉得有必要写这么一篇博客，希望能给后来者一些借鉴。

吐槽一下，网易博客的代码编辑功能真的是很烂。

。

希望不影响阅读。

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

下载	加入VIP,免费下载

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: Hadoop 新旧 API 对比初探链式 mapreduce

冰豆网所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：Hadoop新旧API对比初探+链式mapreduce.docx
链接地址：https://www.bdocx.com/doc/3538323.html

Hadoop新旧API对比初探+链式mapreduce.docx

热门标签