大数据基础课程设计报告Word格式文档下载.docx
- 文档编号:17600616
- 上传时间:2022-12-07
- 格式:DOCX
- 页数:31
- 大小:836.13KB
大数据基础课程设计报告Word格式文档下载.docx
《大数据基础课程设计报告Word格式文档下载.docx》由会员分享,可在线阅读,更多相关《大数据基础课程设计报告Word格式文档下载.docx(31页珍藏版)》请在冰豆网上搜索。
●直接输入URL查询的比例
●查询搜索过”仙剑奇侠传“的uid,并且次数大于3
5.将4每步骤生成的结果保存到HDFS中。
6.将5生成的文件通过JavaAPI方式导入到HBase(一张表)。
7.通过HBaseshell命令查询6导出的结果。
三、实验流程
1.将原始数据加载到HDFS平台
2.将原始数据中的时间字段拆分并拼接,添加年、月、日、小时字段
(1)编写1个脚本sogou-log-extend.sh,其中sogou-log-extend.sh的内容为:
#!
/bin/bash
#infile=/root/sogou.500w.utf8
infile=$1
#outfile=/root/filesogou.500w.utf8.ext
outfile=$2
awk-F'
\t'
'
{print$0"
\t"
substr($1,0,4)"
年\t"
substr($1,5,2)"
月\t"
substr($1,7,2)"
日\t"
substr($1,8,2)"
hour"
}'
$infile>
$outfile
处理脚本文件:
bashsogou-log-extend.shsogou.500w.utf8sogou.500w.utf8.ext
结果为:
3.将处理后的数据加载到HDFS平台
hadoopfs-putsogou.500w.utf8.ext/
4.以下操作分别通过MR和Hive实现
Ⅰ.hive实现
1.查看数据库:
showdatabases;
2.创建数据库:
createdatabasesogou;
3.使用数据库:
usesogou;
4.查看所有表:
showtables;
5.创建sougou表:
Createtablesogou(timestring,uuidstring,namestring,num1int,num2int,urlstring)Rowformatdelimitedfieldsterminatedby'
;
6.将本地数据导入到Hive表里:
Loaddatalocalinpath'
/root/sogou.500w.utf8'
intotablesogou;
7.查看表信息:
descsogou;
(1)查询总条数
selectcount(*)fromsogou;
(2)非空查询条数
selectcount(*)fromsogouwherenameisnotnullandname!
='
'
(3)无重复总条数
selectcount(*)from(select*fromsogougroupbytime,num1,num2,uuid,name,urlhavingcount(*)=1)a;
(4)独立UID总数
selectcount(distinctuuid)fromsogou;
(5)查询频度排名(频度最高的前50词)
selectname,count(*)aspdfromsogougroupbynameorderbypddesclimit50;
(6)查询次数大于2次的用户总数
selectcount(a.uuid)from(selectuuid,count(*)astfromsogougroupbyuuidhavingt>
2)a;
(7)查询次数大于2次的用户占比
selectcount(*)from(selectuuid,count(*)astfromsogougroupbyuuidhavingt>
(8)Rank在10以内的点击次数占比
selectcount(*)fromsogouwherenum1<
11;
(9)直接输入URL查询的比例
selectcount(*)fromsogouwhereurllike'
%%'
(10)查询搜索过”仙剑奇侠传“的uid,并且次数大于3
selectuuid,count(*)asuufromsogouwherename='
仙剑奇侠传'
groupbyuuidhavinguu>
3;
Ⅱ.MapReduce实现(import的各种包省略)
publicclassMRCountAll{
publicstaticIntegeri=0;
publicstaticbooleanflag=true;
publicstaticclassCountAllMapextendsMapper<
Object,Text,Text,Text>
{
Override
protectedvoidmap(Objectkey,Textvalue,Mapper<
.Contextcontext)
throwsIOException,InterruptedException{
i++;
}
publicstaticvoidruncount(StringInputpath,StringOutpath){
Configurationconf=newConfiguration();
conf.set("
fs.defaultFS"
"
hdfs:
//10.49.47.20:
9000"
);
Jobjob=null;
try{
job=Job.getInstance(conf,"
count"
}catch(IOExceptione){
//TODOAuto-generatedcatchblock
e.printStackTrace();
job.setJarByClass(MRCountAll.class);
job.setMapperClass(CountAllMap.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job,newPath(Inputpath));
}catch(IllegalArgumentExceptione){
FileOutputFormat.setOutputPath(job,newPath(Outpath));
job.waitForCompletion(true);
}catch(ClassNotFoundExceptione){
}catch(InterruptedExceptione){
publicstaticvoidmain(String[]args)throwsException{
runcount("
/sogou/data/sogou.500w.utf8"
/sogou/data/CountAll"
System.out.println("
总条数:
"
+i);
}
publicclassCountNotNull{
publicstaticStringStr="
"
publicstaticinti=0;
publicstaticclasswyMapextendsMapper<
Object,Text,Text,IntWritable>
String[]values=value.toString().split("
if(!
values[2].equals(null)&
&
values[2]!
="
){
context.write(newText(values[1]),newIntWritable
(1));
publicstaticvoidrun(StringinputPath,StringoutputPath){
countnotnull"
assertjob!
=null;
job.setJarByClass(CountNotNull.class);
job.setMapperClass(wyMap.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job,newPath(inputPath));
FileOutputFormat.setOutputPath(job,newPath(outputPath));
publicstaticvoidmain(String[]args){
run("
/sogou/data/CountNotNull"
非空条数:
publicclassCountNotRepeat{
publicstaticclassNotRepeatMapextendsMapper<
Object,Text,Text,Text>
{
.Contextcontext)throwsIOException,InterruptedException{
Stringtext=value.toString();
String[]values=text.split("
Stringtime=values[0];
Stringuid=values[1];
Stringname=values[2];
Stringurl=values[5];
context.write(newText(time+uid+name+url),newText("
1"
));
publicstaticclassNotRepeatReducextendsReducer<
Text,IntWritable,Text,IntWritable>
protectedvoidreduce(Textkey,Iterable<
IntWritable>
values,Reducer<
Text,IntWritable,Text,IntWritable>
context.write(newText(key.toString()),newIntWritable(i));
publicstaticvoidmain(String[]args)throwsIOException,ClassNotFoundException,InterruptedException{
job.setJarByClass(CountNotRepeat.class);
job.setMapperClass(NotRepeatMap.class);
job.setReducerClass(NotRepeatReduc.class);
FileInputFormat.addInputPath(job,newPath("
FileOutputFormat.setOutputPath(job,newPath("
/sogou/data/CountNotRepeat"
无重复总条数为:
publicclassCountNotMoreUid{
publicstaticclassUidMapextendsMapper<
context.write(newText(uid),newText("
publicstaticclassUidReducextendsReducer<
job.setMapperClass(UidMap.class);
job.setReducerClass(UidReduc.class);
/sogou/data/CountNotMoreUid"
独立UID条数:
publicclassCountTop50{
publicstaticclassTopMapperextendsMapper<
LongWritable,Text,Text,LongWritable>
Texttext=newText();
protectedvoidmap(LongWritablekey,Textvalue,Contextcontext)
throwsIOException,
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 数据 基础 课程设计 报告