Hadoop FAQWord文档下载推荐.docx
- 文档编号:18364861
- 上传时间:2022-12-15
- 格式:DOCX
- 页数:8
- 大小:21.39KB
Hadoop FAQWord文档下载推荐.docx
《Hadoop FAQWord文档下载推荐.docx》由会员分享,可在线阅读,更多相关《Hadoop FAQWord文档下载推荐.docx(8页珍藏版)》请在冰豆网上搜索。
3.Yetanotherwaytore-balanceblocksistoturnoffthedata-node,whichisfull,waituntilitsblocksarereplicated,andthenbringitbackagain.Theover-replicatedblockswillberandomlyremovedfromdifferentnodes,soyoureallygetthemrebalancednotjustremovedfromthecurrentnode.
4.Finally,youcanusethebin/start-balancer.shcommandtorunabalancingprocesstomoveblocksaroundtheclusterautomatically.See
oHDFSUserGuide:
Rebalancer;
oHDFSTutorial:
Rebalancing;
oHDFSCommandsGuide:
balancer.
7.
HDFS.Whatisthepurposeofthesecondaryname-node?
Theterm"
secondaryname-node"
issomewhatmisleading.
Itisnotaname-nodeinthesensethatdata-nodescannotconnecttothesecondaryname-node,
andinnoeventitcanreplacetheprimaryname-nodeincaseofitsfailure.
Theonlypurposeofthesecondaryname-nodeistoperformperiodiccheckpoints.
Thesecondaryname-nodeperiodicallydownloadscurrentname-nodeimageandeditslogfiles,
joinsthemintonewimageanduploadsthenewimagebacktothe(primaryandtheonly)name-node.
See
UserGuide.
Soifthename-nodefailsandyoucanrestartitonthesamephysicalnodethenthereisnoneed
toshutdowndata-nodes,justthename-nodeneedtoberestarted.
Ifyoucannotusetheoldnodeanymoreyouwillneedtocopythelatestimagesomewhereelse.
Thelatestimagecanbefoundeitheronthenodethatusedtobetheprimarybeforefailureifavailable;
oronthesecondaryname-node.Thelatterwillbethelatestcheckpointwithoutsubsequenteditslogs,
thatisthemostrecentnamespacemodificationsmaybemissingthere.
Youwillalsoneedtorestartthewholeclusterinthiscase.
8.
MR.WhatistheDistributedCacheusedfor?
Thedistributedcacheisusedtodistributelargeread-onlyfilesthatareneededbymap/reducejobstothecluster.Theframeworkwillcopythenecessaryfilesfromaurl(eitherhdfs:
or
http:
)
ontotheslavenodebeforeanytasksforthejobareexecutedonthatnode.Thefilesareonlycopiedonceperjobandsoshouldnotbemodifiedbytheapplication.
9.
MR.CanIwritecreate/write-tohdfsfilesdirectlyfrommymap/reducetasks?
Yes.(Clearly,youwantthissinceyouneedtocreate/write-tofilesotherthantheoutput-filewrittenoutby
OutputCollector.)
Caveats:
<
glossary>
${mapred.output.dir}istheeventualoutputdirectoryforthejob(JobConf.setOutputPath
/
JobConf.getOutputPath).
${taskid}istheactualidoftheindividualtask-attempt(e.g.task_200709221812_0001_m_000000_0),aTIPisabunchof${taskid}s(e.g.task_200709221812_0001_m_000000).
/glossary>
With
speculative-execution
on,onecouldfaceissueswith2instancesofthesameTIP(runningsimultaneously)tryingtoopen/write-tothesamefile(path)onhdfs.Hencetheapp-writerwillhavetopickuniquenames(e.g.usingthecompletetaskidi.e.task_200709221812_0001_m_000000_0)pertask-attempt,notjustperTIP.(Clearly,thisneedstobedoneeveniftheuserdoesn'
tcreate/write-tofilesdirectlyviareducetasks.)
Togetaroundthistheframeworkhelpstheapplication-writeroutbymaintainingaspecial
${mapred.output.dir}/_${taskid}
sub-dirforeachtask-attemptonhdfswheretheoutputofthereducetask-attemptgoes.Onsuccessfulcompletionofthetask-attemptthefilesinthe${mapred.output.dir}/_${taskid}(ofthesuccessfultaskidonly)aremovedto${mapred.output.dir}.Ofcourse,theframeworkdiscardsthesub-directoryofunsuccessfultask-attempts.Thisiscompletelytransparenttotheapplication.
Theapplication-writercantakeadvantageofthisbycreatinganyside-filesrequiredin${mapred.output.dir}duringexecutionofhisreduce-task,andtheframeworkwillmovethemoutsimilarly-thusyoudon'
thavetopickuniquepathspertask-attempt.
Fine-print:
thevalueof${mapred.output.dir}duringexecutionofaparticulartask-attemptisactually${mapred.output.dir}/_{$taskid},notthevaluesetby
JobConf.setOutputPath.
So,justcreateanyhdfsfilesyouwantin${mapred.output.dir}fromyourreducetasktotakeadvantageofthisfeature.
Theentirediscussionholdstrueformapsofjobswithreducer=NONE(i.e.0reduces)sinceoutputofthemap,inthatcase,goesdirectlytohdfs.
10.
MR.HowdoIgeteachofmymapstoworkononecompleteinput-fileandnotallowtheframeworktosplit-upmyfiles?
Essentiallyajob'
sinputisrepresentedbythe
InputFormat(interface)/FileInputFormat(baseclass).
Forthispurposeonewouldneeda'
non-splittable'
FileInputFormat
i.e.aninput-formatwhichessentiallytellsthemap-reduceframeworkthatitcannotbesplit-upandprocessed.Todothisyouneedyourparticularinput-formattoreturn
false
forthe
isSplittable
call.
E.g.
org.apache.hadoop.mapred.SortValidator.RecordStatsChecker.NonSplitableSequenceFileInputFormat
in
src/test/org/apache/hadoop/mapred/SortValidator.java
Inadditiontoimplementingthe
InputFormat
interfaceandhavingisSplitable(...)returningfalse,itisalsonecessarytoimplementthe
RecordReaderinterfaceforreturningthewholecontentoftheinputfile.(defaultis
LineRecordReader,whichsplitsthefileintoseparatelines)
Theother,quick-fixoption,istoset
mapred.min.split.size
tolargeenoughvalue.
11.
WhyIdoseebrokenimagesinjobdetails.jsppage?
Inhadoop-0.15,Map/Reducetaskcompletiongraphicsareadded.ThegraphsareproducedasSVG(ScalableVectorGraphics)images,whicharebasicallyxmlfiles,embeddedinhtmlcontent.ThegraphicsaretestedsuccessfullyinFirefox2onUbuntuandMACOS.Howeverforotherbrowsers,oneshouldinstallanadditionalplugintothebrowsertoseetheSVGimages.Adobe'
sSVGViewercanbefoundat
12.
HDFS.Doesthename-nodestayinsafemodetillallunder-replicatedfilesarefullyreplicated?
No.Duringsafemodereplicationofblocksisprohibited.
Thename-nodeawaitswhenallormajorityofdata-nodesreporttheirblocks.
Dependingonhowsafemodeparametersareconfiguredthename-nodewillstayinsafemode
untilaspecificpercentageofblocksofthesystemisminimally
replicated
dfs.replication.min.
Ifthesafemodethreshold
dfs.safemode.threshold.pct
issetto1thenallblocksofall
filesshouldbeminimallyreplicated.
Minimalreplicationdoesnotmeanfullreplication.Somereplicasmaybemissingandin
ordertoreplicatethemthename-nodeneedstoleavesafemode.
Learnmoreaboutsafemode
here.
13.
MR.Iseeamaximumof2maps/reducesspawnedconcurrentlyoneachTaskTracker,howdoIincreasethat?
Usetheconfigurationknob:
mapred.tasktracker.map.tasks.maximum
and
mapred.tasktracker.reduce.tasks.maximum
tocontrolthenumberofmaps/reducesspawnedsimultaneouslyonaTaskTracker.Bydefault,itissetto
2,henceoneseesamaximumof2mapsand2reducesatagiveninstanceonaTaskTracker.
Youcansetthoseonaper-tasktrackerbasistoaccuratelyreflectyourhardware(i.e.setthosetohighernos.onabeefiertasktrackeretc.).
14.
MR.Submittingmap/reducejobsasadifferentuserdoesn'
twork.
Theproblemisthatyouhaven'
tconfiguredyourmap/reducesystem
directorytoafixedvalue.Thedefaultworksforsinglenodesystems,butnotfor"
real"
clusters.Iliketouse:
property>
<
name>
mapred.system.dir<
/name>
value>
/hadoop/mapred/system<
/value>
description>
TheshareddirectorywhereMapReducestorescontrolfiles.
/description>
/property>
Notethatthisdirectoryisinyourdefaultfilesystemandmustbe
accessiblefromboththeclientandservermachinesandistypically
inHDFS.
15.
HDFS.HowdoIsetupahadoopnodetousemultiplevolumes?
Data-nodes
canstoreblocksinmultipledirectoriestypicallyallocatedondifferentlocaldiskdrives.
Inordertosetupmultipledirectoriesoneneedstospecifyacommaseparatedlistofpathnamesasavalueof
theconfigurationparameter
dfs.data.dir.
Data-nodeswillattempttoplaceequalamountofdataineachofthedirectories.
The
name-node
alsosupportsmultipledirectories,whichinthecasestorethenamespaceimageandtheeditslog.
Thedirectoriesarespecifiedviathedfs.name.dir
configurationparameter.
Thename-nodedirectoriesareusedforthenamespacedatareplicationsothattheimageandthe
logcouldberestoredfromtheremainingvolumesifoneofthemfails.
16.
HDFS.WhathappensifoneHadoopclientrenamesafileoradirectorycontainingthisfilewhileanotherclientisstillwritingintoit?
Startingwithreleasehadoop-0.15,afilewillappearinthenamespaceassoonasitiscreated.
Ifawriteriswritingtoafileandanotherclientrenameseitherthefileitselforanyofitspath
components,thentheoriginalwriterwillg
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- Hadoop FAQ
![提示](https://static.bdocx.com/images/bang_tan.gif)