书签分享收藏举报版权申诉 / 18

立即下载加入VIP,免费下载

当前位置：首页 > 党团工作 > 思想汇报心得体会 > 佛山科学技术学院云计算实验报告了解MapReduce编程Word格式.docx

佛山科学技术学院云计算实验报告了解MapReduce编程Word格式.docx

文档编号：22815811
上传时间：2023-02-05
格式：DOCX
页数：18
大小：160.19KB

《佛山科学技术学院云计算实验报告了解MapReduce编程Word格式.docx》由会员分享，可在线阅读，更多相关《佛山科学技术学院云计算实验报告了解MapReduce编程Word格式.docx（18页珍藏版）》请在冰豆网上搜索。

佛山科学技术学院云计算实验报告了解MapReduce编程Word格式.docx

二、实验内容与完成情况

学习MapReduce编程模型，理解MapReduce编程思想，会用MapReduce框架编写简单的并行程序；

熟练使用Eclipse编写、调试和运行MapReduce并行程序。

1）编程实现文件合并和去重操作

对于两个输入文件，即文件A和文件B，请编写MapReduce程序，对两个文件进行合并，并剔除其中重复的内容，得到一个新的输出文件C。

下面是输入文件和输出文件的一个样例供参考。

输入文件A的样例如下：

20150101x

20150102y

20150103x

20150104y

20150105z

20150106x

输入文件B的样例如下：

20150101y

20150102y

20150103x

20150104z

20150105y

根据输入文件A和B合并得到的输出文件C的样例如下：

20150101x

20150104y

答案：

代码如下：

packagecom.Merge;

importjava.io.IOException;

importorg.apache.hadoop.conf.Configuration;

importorg.apache.hadoop.fs.Path;

importorg.apache.hadoop.io.IntWritable;

importorg.apache.hadoop.io.Text;

importorg.apache.hadoop.mapreduce.Job;

importorg.apache.hadoop.mapreduce.Mapper;

importorg.apache.hadoop.mapreduce.Reducer;

importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;

importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

importorg.apache.hadoop.util.GenericOptionsParser;

publicclassMerge{

/**

*@paramargs

*对A,B两个文件进行合并，并剔除其中重复的内容，得到一个新的输出文件C

*/

//重载map函数，直接将输入中的value复制到输出数据的key上

publicstaticclassMapextendsMapper<

Object,Text,Text,Text>

{

privatestaticTexttext=newText（）;

publicvoidmap（Objectkey,Textvalue,Contextcontext）throwsIOException,InterruptedException{

text=value;

context.write（text,newText（"

"

））;

}

//重载reduce函数，直接将输入中的key复制到输出数据的key上

publicstaticclassReduceextendsReducer<

Text,Text,Text,Text>

publicvoidreduce（Textkey,Iterable<

Text>

values,Contextcontext）throwsIOException,InterruptedException{

context.write（key,newText（"

publicstaticvoidmain（String[]args）throwsException{

//TODOAuto-generatedmethodstub

Configurationconf=newConfiguration（）;

conf.set（"

fs.default.name"

"

hdfs:

//localhost:

9000"

）;

String[]otherArgs=newString[]{"

input"

output"

};

/*直接设置输入参数*/

if（otherArgs.length!

=2）{

System.err.println（"

Usage:

wordcount<

in>

<

out>

System.exit

（2）;

}

Jobjob=Job.getInstance（conf,"

Mergeandduplicateremoval"

job.setJarByClass（Merge.class）;

job.setMapperClass（Map.class）;

job.setCombinerClass（Reduce.class）;

job.setReducerClass（Reduce.class）;

job.setOutputKeyClass（Text.class）;

job.setOutputValueClass（Text.class）;

FileInputFormat.addInputPath（job,newPath（otherArgs[0]））;

FileOutputFormat.setOutputPath（job,newPath（otherArgs[1]））;

System.exit（job.waitForCompletion（true）?

0:

1）;

}

2）编写程序实现对输入文件的排序

现在有多个输入文件，每个文件中的每行内容均为一个整数。

要求读取所有文件中的整数，进行升序排序后，输出到一个新的文件中，输出的数据格式为每行两个整数，第一个数字为第二个整数的排序位次，第二个整数为原待排列的整数。

输入文件1的样例如下：

33

37

12

40

输入文件2的样例如下：

4

16

39

5

输入文件3的样例如下：

45

25

根据输入文件1、2和3得到的输出文件如下：

11

24

35

412

516

625

733

837

939

1040

1145

packagecom.MergeSort;

importorg.apache.hadoop.mapreduce.Partitioner;

publicclassMergeSort{

*输入多个文件，每个文件中的每行内容均为一个整数

*输出到一个新的文件中，输出的数据格式为每行两个整数，第一个数字为第二个整数的排序位次，第二个整数为原待排列的整数

//map函数读取输入中的value，将其转化成IntWritable类型，最后作为输出key

Object,Text,IntWritable,IntWritable>

privatestaticIntWritabledata=newIntWritable（）;

Stringtext=value.toString（）;

data.set（Integer.parseInt（text））;

context.write（data,newIntWritable

（1））;

//reduce函数将map输入的key复制到输出的value上，然后根据输入的value-list中元素的个数决定key的输出次数,定义一个全局变量line_num来代表key的位次

IntWritable,IntWritable,IntWritable,IntWritable>

privatestaticIntWritableline_num=newIntWritable

（1）;

publicvoidreduce（IntWritablekey,Iterable<

IntWritable>

values,Contextcontext）throwsIOException,InterruptedException{

for（IntWritableval:

values）{

context.write（line_num,key）;

line_num=newIntWritable（line_num.get（）+1）;

//自定义Partition函数，此函数根据输入数据的最大值和MapReduce框架中Partition的数量获取将输入数据按照大小分块的边界，然后根据输入数值和边界的关系返回对应的PartitonID

publicstaticclassPartitionextendsPartitioner<

IntWritable,IntWritable>

publicintgetPartition（IntWritablekey,IntWritablevalue,intnum_Partition）{

intMaxnumber=65223;

//int型的最大数值

intbound=Maxnumber/num_Partition+1;

intkeynumber=key.get（）;

for（inti=0;

i<

num_Partition;

i++）{

if（keynumber<

bound*（i+1）&

&

keynumber>

=bound*i）{

returni;

}

return-1;

Jobjob=Job.getInstance（conf,"

Mergeandsort"

job.setJarByClass（MergeSort.class）;

job.setPartitionerClass（Partition.class）;

job.setOutputKeyClass（IntWritable.class）;

job.setOutputValueClass（IntWritable.class）;

1）对给定的表格进行信息挖掘

下面给出一个child-parent的表格，要求挖掘其中的父子辈关系，给出祖孙辈关系的表格。

输入文件内容如下：

childparent

StevenLucy

StevenJack

JoneLucy

JoneJack

LucyMary

LucyFrank

JackAlice

JackJesse

DavidAlice

DavidJesse

PhilipDavid

PhilipAlma

MarkDavid

MarkAlma

输出文件内容如下：

grandchildgrandparent

MarkJesse

MarkAlice

PhilipJesse

PhilipAlice

JoneJesse

JoneAlice

StevenJesse

StevenAlice

StevenFrank

StevenMary

JoneFrank

JoneMary

packagecom.simple_data_mining;

importjava.util.*;

publicclasssimple_data_mining{

publicstaticinttime=0;

*输入一个child-parent的表格

*输出一个体现grandchild-grandparent关系的表格

//Map将输入文件按照空格分割成child和parent，然后正序输出一次作为右表，反序输出一次作为左表，需要注意的是在输出的value中必须加上左右表区别标志

Stringchild_name=newString（）;

Stringparent_name=newString（）;

Stringrelation_type=newString（）;

Stringline=value.toString（）;

inti=0;

while（line.charAt（i）!

='

'

）{

i++;

String[]values={line.substring（0,i）,line.substring（i+1）};

if（values[0].compareTo（"

child"

）!

=0）{

child_name=values[0];

parent_name=values[1];

relation_type="

1"

;

//左右表区分标志

context.write（newText（values[1]）,newText（relation_type+"

+"

+child_name+"

+parent_name））;

//左表

2"

context.write（newText（values[0]）,newText（relation_type+"

//右表

values,Contextcontext）throwsIOException,InterruptedException{

if（time==0）{//输出表头

context.write（newText（"

grand_child"

）,newText（"

grand_parent"

time++;

intgrand_child_num=0;

Stringgrand_child[]=newString[10];

intgrand_parent_num=0;

Stringgrand_parent[]=newString[10];

Iteratorite=values.iterator（）;

while（ite.hasNext（））{

Stringrecord=ite.next（）.toString（）;

intlen=record.length（）;

inti=2;

if（len==0）continue;

charrelation_type=record.charAt（0）;

Stringchild_name=newString（）;

Stringparent_name=newString（）;

//获取value-list中value的child

while（record.charAt（i）!

+'

child_name=child_name+record.charAt（i）;

i++;

i=i+1;

//获取value-list中value的parent

while（i<

len）{

parent_name=parent_name+record.charAt（i）;

//左表，取出child放入grand_child

if（relation_type=='

1'

grand_child[grand_child_num]=child_name;

grand_child_num++;

else{//右表，取出parent放入grand_parent

grand_parent[grand_parent_num]=parent_name;

grand_parent_num++;

if（grand_parent_num!

=0&

grand_child_num!

=0）{

for（intm=0;

m<

grand_child_num;

m++）{

for（intn=0;

n<

gran

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

下载	加入VIP,免费下载

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 佛山科学技术学院计算实验报告了解 MapReduce 编程

冰豆网所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：佛山科学技术学院云计算实验报告了解MapReduce编程Word格式.docx
链接地址：https://www.bdocx.com/doc/22815811.html

佛山科学技术学院云计算实验报告了解MapReduce编程Word格式.docx

热门标签