全文搜索引擎的设计与实现外文翻译Word下载.doc
- 文档编号:13164700
- 上传时间:2022-10-07
- 格式:DOC
- 页数:25
- 大小:128.50KB
全文搜索引擎的设计与实现外文翻译Word下载.doc
《全文搜索引擎的设计与实现外文翻译Word下载.doc》由会员分享,可在线阅读,更多相关《全文搜索引擎的设计与实现外文翻译Word下载.doc(25页珍藏版)》请在冰豆网上搜索。
ArchitectureandDesign
中文译文Hadoop分布式文件系统:
架构和设计
姓名XXXX
学号200708202137
2013年4月8日
英文原文
TheHadoopDistributedFileSystem:
ArchitectureandDesign
Source:
http:
//hadoop.apache.org/docs/r0.18.3/hdfs_design.html
Introduction
TheHadoopDistributedFileSystem(HDFS)isadistributedfilesystemdesignedtorunoncommodityhardware.Ithasmanysimilaritieswithexistingdistributedfilesystems.However,thedifferencesfromotherdistributedfilesystemsaresignificant.HDFSishighlyfault-tolerantandisdesignedtobedeployedonlow-costhardware.HDFSprovideshighthroughputaccesstoapplicationdataandissuitableforapplicationsthathavelargedatasets.HDFSrelaxesafewPOSIXrequirementstoenablestreamingaccesstofilesystemdata.HDFSwasoriginallybuiltasinfrastructurefortheApacheNutchwebsearchengineproject.HDFSispartoftheApacheHadoopCoreproject.TheprojectURLishttp:
//hadoop.apache.org/core/.
AssumptionsandGoals
HardwareFailure
Hardwarefailureisthenormratherthantheexception.AnHDFSinstancemayconsistofhundredsorthousandsofservermachines,eachstoringpartofthefilesystem’sdata.Thefactthatthereareahugenumberofcomponentsandthateachcomponenthasanon-trivialprobabilityoffailuremeansthatsomecomponentofHDFSisalwaysnon-functional.Therefore,detectionoffaultsandquick,automaticrecoveryfromthemisacorearchitecturalgoalofHDFS.
StreamingDataAccess
ApplicationsthatrunonHDFSneedstreamingaccesstotheirdatasets.Theyarenotgeneralpurposeapplicationsthattypicallyrunongeneralpurposefilesystems.HDFSisdesignedmoreforbatchprocessingratherthaninteractiveusebyusers.Theemphasisisonhighthroughputofdataaccessratherthanlowlatencyofdataaccess.POSIXimposesmanyhardrequirementsthatarenotneededforapplicationsthataretargetedforHDFS.POSIXsemanticsinafewkeyareashasbeentradedtoincreasedatathroughputrates.
LargeDataSets
ApplicationsthatrunonHDFShavelargedatasets.AtypicalfileinHDFSisgigabytestoterabytesinsize.Thus,HDFSistunedtosupportlargefiles.Itshouldprovidehighaggregatedatabandwidthandscaletohundredsofnodesinasinglecluster.Itshouldsupporttensofmillionsoffilesinasingleinstance.
SimpleCoherencyModel
HDFSapplicationsneedawrite-once-read-manyaccessmodelforfiles.Afileoncecreated,written,andclosedneednotbechanged.Thisassumptionsimplifiesdatacoherencyissuesandenableshighthroughputdataaccess.AMap/Reduceapplicationorawebcrawlerapplicationfitsperfectlywiththismodel.Thereisaplantosupportappending-writestofilesinthefuture.
“MovingComputationisCheaperthanMovingData”
Acomputationrequestedbyanapplicationismuchmoreefficientifitisexecutednearthedataitoperateson.Thisisespeciallytruewhenthesizeofthedatasetishuge.Thisminimizesnetworkcongestionandincreasestheoverallthroughputofthesystem.Theassumptionisthatitisoftenbettertomigratethecomputationclosertowherethedataislocatedratherthanmovingthedatatowheretheapplicationisrunning.HDFSprovidesinterfacesforapplicationstomovethemselvesclosertowherethedataislocated.
PortabilityAcrossHeterogeneousHardwareandSoftwarePlatforms
HDFShasbeendesignedtobeeasilyportablefromoneplatformtoanother.ThisfacilitateswidespreadadoptionofHDFSasaplatformofchoiceforalargesetofapplications.
NameNodeandDataNodes
HDFShasamaster/slavearchitecture.AnHDFSclusterconsistsofasingleNameNode,amasterserverthatmanagesthefilesystemnamespaceandregulatesaccesstofilesbyclients.Inaddition,thereareanumberofDataNodes,usuallyonepernodeinthecluster,whichmanagestorageattachedtothenodesthattheyrunon.HDFSexposesafilesystemnamespaceandallowsuserdatatobestoredinfiles.Internally,afileissplitintooneormoreblocksandtheseblocksarestoredinasetofDataNodes.TheNameNodeexecutesfilesystemnamespaceoperationslikeopening,closing,andrenamingfilesanddirectories.ItalsodeterminesthemappingofblockstoDataNodes.TheDataNodesareresponsibleforservingreadandwriterequestsfromthefilesystem’sclients.TheDataNodesalsoperformblockcreation,deletion,andreplicationuponinstructionfromtheNameNode.
TheNameNodeandDataNodearepiecesofsoftwaredesignedtorunoncommoditymachines.ThesemachinestypicallyrunaGNU/Linuxoperatingsystem(OS).HDFSisbuiltusingtheJavalanguage;
anymachinethatsupportsJavacanruntheNameNodeortheDataNodesoftware.UsageofthehighlyportableJ
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 全文 搜索引擎 设计 实现 外文 翻译