书签分享收藏举报版权申诉 / 18

立即下载加入VIP,免费下载

当前位置：首页 > 初中教育 > 理化生 > hadoop入门学习笔记Word下载.docx

hadoop入门学习笔记Word下载.docx

文档编号：19191080
上传时间：2023-01-04
格式：DOCX
页数：18
大小：29.67KB

《hadoop入门学习笔记Word下载.docx》由会员分享，可在线阅读，更多相关《hadoop入门学习笔记Word下载.docx（18页珍藏版）》请在冰豆网上搜索。

hadoop入门学习笔记Word下载.docx

1.主节点:

只有一个JobTracker,负责

接收客户提交的计算任务

把计算任务分配给TaskTracker执行,即任务调度

2..从节点:

有很多个TaskTracker

执行JobTracker分配的计算任务

HDFS中namenode,datanode于MapReduce中JobTracker和TaskTracker的区别

1.HDFS是负责数据存储的,MapReduce是负责任务分发与处理的

2.Namenode和jobTracker一般不配置在同一节点上.因为namenode是负责对外请求处理,而Jobtracker是对内任务分发的,应分开以保证系统安全性

3.Datanode与taskTracker一般配置在同一节点上,因为TaskTracker只处理本地数据

4.Datanode与TaskTracker不在同一节点时,TaskTracker处理的数据来源于其他文件系统

5.用户进行存储是,与datanode直接打交道,namenode只是为用户提供blockid,并记录

6.数据切歌存储是,一条数据存放在同一datanode上,否则数据作废

1.3搭建Hadoop集群环境

Hadoop部署方式

1.本地模式

2.伪分布模式:

在同一节点上运行hadoop的各个进程

3.集群模式:

Hadoop的各个进程运行在集群的很多节点上

二.初识hadoop

2.1HDFS的shell操作

1.分布式文件系统与HDFS

2.HDFS体系结构与基本概念

3.HDFS的shell操作

4.java常用接口及api

HDFS的特点

1.对用户访问透明的文件系统,数据存放在许多节点上

2.适用于一次写入多次查询的情况,不支持并发写,不适合小文件

3.HDFS中有一层体系结构,把外部客户端看到的文件系统逻辑结构与真正数据存储的物理结构,解耦了

HDFS的shell操作

1.hadoopfs–lshdfs:

//nn1:

22/

查看nn1节点下的目录和文件（仅显示此目录下的）

2.hadoopfs–lsrhdfs:

循环递归查看nn1节点下的所有目录和文件（包括子目录中的内容）

3.hadoopfs–mkdirhdfs:

22/dir1

4.hadoopfs–touchzhdfs:

22/dir1/file

在nn1节点的dir1目录下创建file空文件

5.hadoppfs–puthellohdfs:

将本地的hello文件上传至nn1节点的dir1目录下（重复上传文件会报错）

6.hadoopfs–texthellohdfs:

查看file文件中的内容

上传时若省略节点路径,系统会使用默认客户端的路径,默认客户端路径由core-site.xml中的fs.default.name的值确定

7.hadoopfs–gethdfs:

22/dir/file

将file文件下载到本地

8.hadoopfs–rmrhdfs:

删除文件或目录,删除目录时,目录下的文件也被删除

Namenode

1.是整个文件系统的管理节点,他维护着整个文件系统的文件目录树,文件/目录的元信息和每个文件对应的数据块列表,接收用户的操作请求

2.文件包括:

Fsimage:

元数据镜像文件,存储某一时段namenode内存元数据信息

Edit:

操作日志文件

Fstime:

保存最近一次checkpoint的时间

3.以上这些文件是保存在linux文件系统中的

4.文件上传时,namenode只负责管理,登记上传文件存储块,至于传输过程中,客户与datanode直接传输,不经过namenode

5.文件上传时,namenode会将大文件切割成许多小文件来存储（一般64mb）,提升了存储空间利用率,增强文件传输过程的安全性（传输时进程down掉时,只需重传64mb的文件,而不需重传全部）.至于各小块存储位置和如何进行组装由HDFS封装了

注:

fsimage是数据与数据块存放的映射表,是集群的关键,在hdfs-site.xml中配置dfs.name.dir的值,将其改为由逗号隔开的多个目录列表（逗号为英文,前后无空格,切目录必须为已存在的目录）,则fsimage将在每个路径下存放一遍.

Datanode

1.提供真实数据的存储服务

2.文件快:

最基本的存储单位.对于文件内容而言,一个文件的长度大小是size,那么从文件的0偏移开始,按照固定的大小,顺序对文件进行划分并编号,划分好的每一个块成为一个block,HDFS默认Block大小为64MB,以一个256MB文件,共有256/64=4个block

3.不同于普通文件系统的是,HDFS中,如果一个文件小于一个数据块的大小,并不占用整个数据存储空间

4.Replication,多副本,默认只有一个

1.默认快大小修改,在hdfs-defalut.xml中找到dfs.block.size,将其包含的property复制到hdfs-site.xml中,在修改其值,注意不要在源文件中修改

2.修改默认副本数,过程同上,找到dfs.replication

3.当文件大于64MB时,按64MB划分Block存储,剩余不足部分按实际大小占用block块存储

secondaryNameNode

1.HA的一个解决方案,但不支持热备,配置即可

2.执行过程:

从NameNode上下载数据信息（fsimage,edits）,然后把两者合并,生成新的fsimage,在本地保存,并将其推送至NameNode,同时重置NameNode上的edits

3.默认安装在NameNode节点上,但这样不安全

注,2.的详解:

NameNode每隔1小时或edits满64Mb就触发合并,合并时,将数据传到secondaryNameNode时,因数据读写不能同步进行,此时NameNode产生一个新的日志文件edits.new用来存放这段时间的操作日志.SecondaryNameNode合并成fsimage后回传给NameNode替换掉原有fsimage,并将edits.new命名为edits.

补充说明:

当NameNode当机时,通过secondaryNameNode恢复时,会丢失edits.new的映射表

本章经验总结:

1.搭建集群环境时,对于NameNode和JobTracker不在同一节点时,配置NameNode在chaoren1上,JobTracker在chaoren2上

（1）1.NameNode所在节点是通过配置文件core-site.xml的fs.default.name实现的.其值设为hdfs:

//chaoren1:

9000;

JobTracker所在节点是通过mapred-site.xml的mapred.job.tracker的值实现,其值设为hdfs:

//chaoren2:

9001

2.把修改复制到集群的其他节点中

3.其同时,不要使用start-all.sh,而应该使用hadoop-daemon.shstartxxxx

1.2在chaoren1上执行命令hadoop-daemon.shstartjobtracker

在chaoren2上执行hadoop-daemon.shstartsecondarynamenode

在chaoren2上执行hadoop-daemon.shstartdatanode

在chaoren1上执行hadoop-daemon.shstarttasktracker

2.NameNode单点问题的解决

1.使用dfs.name.dir的多目录保存数据

2.使用secondaryNameNode

3.使用第三方的avatanode

3.执行HDFS格式化时,NameNode创建自己的目录结构,datanode因为没有实际数据,对其不产生任何影响

4.HDFSshell

使用put命令上传文件时

1.如果目的地是一个已经存在的文件夹时,新文件上传到文件夹内,文件名是原来的文件名

2.如果目的地是一个已经存在的文件夹时,且源文件已经存在,再次上传报错

3.如果目的地是一个不存在的路径时,新文件上传成功,文件名是目标路径名称

2.2HDFS的java操作

packagehdfs;

importjava.io.IOException;

import.URI;

importorg.apache.hadoop.conf.Configuration;

importorg.apache.hadoop.fs.FSDataInputStream;

importorg.apache.hadoop.fs.FSDataOutputStream;

importorg.apache.hadoop.fs.FileStatus;

importorg.apache.hadoop.fs.FileSystem;

importorg.apache.hadoop.fs.Path;

publicclassexample1{

staticConfigurationconf=newConfiguration（）;

staticFileSystemhdfs;

staticStringfirstHdfsDir;

staticStringsecondHdfsDir;

staticStringlocalDir;

static{

StringHDFSUrl="

hdfs:

//cluster/"

;

try{

hdfs=FileSystem.get（URI.create（HDFSUrl）,conf）;

}catch（IOExceptione）{

e.printStackTrace（）;

}

publicstaticvoidmain（String[]args）throwsException{

finalStringlocalDir="

/home/hadoop/README.txt"

finalStringformerDir="

/xiong/"

finalStringnewDir="

/xiongwei/"

finalStringfileDir="

/xiong/a"

finalStringcontent="

xiongwei"

finalStringnewFileDir="

/xiongwei/b"

System.out.println（formerDir+"

'

sstatus:

"

）;

ListAll（formerDir）;

System.out.println（"

mkdir:

+Mkdir（newDir））;

create:

+CreateHDFSFile（fileDir,content））;

put:

+PutHDFSFile（localDir,formerDir））;

get:

+GetHDFSFile（localDir,fileDir））;

System.out.println（fileDir+"

scontext:

ReadHDFSFile（fileDir）;

rename:

+renameHDFSFile（newFileDir,fileDir））;

/**

*getlocalpath

*

*@paramstr

*/

publicstaticvoidgetLocalDir（Stringstr）{

localDir=str;

*getHDFSpath.str'

svaluepasstofirstHdfsDirwhilewhichistrue,Otherwise,passtosecondHdfsDir

*@paramwhich

publicstaticvoidgetHDFSDir（Stringstr,booleanwhich）{

if（which）

firstHdfsDir=str;

else

secondHdfsDir=str;

*renamefilename.Andtheformerfilewillbedeleted.

*@paramnewFileDir

*thedirectionofnewfilename

*@paramformerDir

*thedirectionofformerfilename

*@return

privatestaticbooleanrenameHDFSFile（StringnewFileDir,StringformerDir）{

Pathpath=newPath（newFileDir）;

Pathtopath=newPath（formerDir）;

//formerfilewillberetainedafterthefunctionisexecuted.

booleanisRenamed=hdfs.rename（topath,path）;

//deleteformerfile

deleteHDFSFile（formerDir）;

returnisRenamed;

returnfalse;

*deleteadirectoryorafile,filesinthedirectorywillbedeletedif

*it'

sadirectory

privatestaticbooleandeleteHDFSFile（StringformerDir）{

Pathpath=newPath（formerDir）;

booleanisdeleted=hdfs.delete（path,true）;

returnisdeleted;

*readthecontextoffile

*@paramfileDir

privatestaticvoidReadHDFSFile（StringfileDir）{

Pathpath=newPath（fileDir）;

if（hdfs.exists（path））{

FSDataInputStreamis=hdfs.open（path）;

FileStatusfs=hdfs.getFileStatus（path）;

byte[]buffer=newbyte[（int）fs.getLen（）];

is.readFully（0,buffer）;

Stringcontent=newString（buffer）;

System.out.println（content）;

is.close（）;

}

*downloadafilefromHDFStothelocal

*@paramlocalDir

*localdirection

privatestaticbooleanGetHDFSFile（StringlocalDir,StringfileDir）{

Pathtopath=newPath（localDir）;

hdfs.copyToLocalFile（path,topath）;

returntrue;

*putalocalfiletoHDFS

privatestaticbooleanPutHDFSFile（StringlocalDir,StringformerDir）{

Pathpath=newPath（localDir）;

hdfs.copyFromLocalFile（path,topath）;

*createafileandwritethecontent

*@paramcontent

privatestaticbooleanCreateHDFSFile（StringfileDir,Stringcontent）{

FSDataOutputStreamos=hdfs.create（path）;

os.writeBytes（content）;

os.close（）;

*createadirectory

*@paramnewDir

privatestaticbooleanMkdir（StringnewDir）{

Pathpath=newPath（newDir）;

booleanisMkdir=hdfs.mkdirs（path）;

returnisMkdir;

*listallinthedirectory

privatestaticvoidListAll（StringformerDir）{

FileStatus[]list=hdfs.listStatus（path）;

for（FileStatusfs:

list）{

System.out.println（fs.getPath（）.toString（））;

*Checkifexists.

publicstaticbooleanisExist（StringfileDir）{

returnhdfs.exists（path）;

}

作业:

1.使用FileSystem.listStatus（）方法显示内容如hadoopfs–ls所示

2.使用FileSystem.listStatus（）方法显示内容如hadoopfs–lsr所示

3.比较archive,sequencefile,mapfile在处理小文件时,有什么异同,分别使用在什么场景?

2.3.1使用ant发布项目至服务器

1.在http:

//ant.apache.org/网站上找到Downlo

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

下载	加入VIP,免费下载

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: hadoop 入门学习笔记

冰豆网所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：hadoop入门学习笔记Word下载.docx
链接地址：https://www.bdocx.com/doc/19191080.html

hadoop入门学习笔记Word下载.docx

热门标签