多级存储器练习.docx
- 文档编号:26671958
- 上传时间:2023-06-21
- 格式:DOCX
- 页数:13
- 大小:190.74KB
多级存储器练习.docx
《多级存储器练习.docx》由会员分享,可在线阅读,更多相关《多级存储器练习.docx(13页珍藏版)》请在冰豆网上搜索。
多级存储器练习
6.823ComputerSystemArchitecture
DatapathforDLXSpring2002
ProblemSet#2
Studentsareallowedtocollaborateingroupsofupto3people.Agrouphandsinonlyonecopyofthesolutiontoaproblemset.Homeworkassignmentsaredueatthebeginningofclassonthesuedate.Tofacilitategrading,eachproblemmustbestapledseparately.Homeworkwillnotbeacceptedoncesolutionsarehandedout.
Problem1:
MicroprogrammingandBus-BasedArchitectures
Problem1.A
HowmanycyclesdoesittaketoexecutethefollowinginstructionsinthemicrocodedDLXmachine?
UsethestatesandcontrolpointsfromDLX-Controller-2andassumeMemorywillnotassertitsbusysingal.
Whichinstructiontakesthemostcyclestoexecute?
Whichinstructiontakesthefewestcyclestoexecute?
Problem1.B
BenBitdiddleneedstocomputefactorialsforsmallnumbers.RealizingthereisnomultiplyinstructioninthemicrocodedDLXmachine,heusesthefollowingcodetocalculatethefactorialofanunsignednumbern.
ThevariablesI,j,n,temp,andresultareunsigned32-bitvalues.
WritetheDLXassemblythatimplementsBen’sfactorialcode.UseonlytheDLZinstructionsthatcanbeexecutedonthemicrocodedDLXmachine(ALU,ALUi,LW,SW,J,JAL,JR,JALR,BENQ,andBENS).ThemicrocodedDLXmachinedoesnothavetobepreserved.
HowmanyDLXinstructionsareexecutedtocalculateafactorial?
Howmanycyclesdoesittaketocalculateafactorial?
Again,usethestatesandcontrolpointsfromDLX-Controller-2andassumeMemorywillnotassertitsbusysignal.
Problem1.C
AlyssaP.HackertellsBenthathisfactorialcodewillrunmuchfasterifheimplementsanunsignedmultiplyinstructioninthemicrocodedDLXmachine.ThenBencanreplacetheinnerloopinstructionswiththenewmultiplyinstruction.
ThedetailsofAlyssa’snewproposedunsignedmultiplyinstructionare:
ThevalueofRs1isaddedRs2timesandtheresultstoredintoRd.Rs1andRs2aretreatedasunsigned32-bitvalues.IfRs2orRs1is0,thentheresultofRdwillalsobe0.TheformatoftheMULUinstructionisR-type.
InordertobeabletowritemicrocodeforMULU,Alyssaaddsanadditionalregister,TO(33),totheregisterfile.Thisregister,likethePCregister,isnotvisibletotheprogrammer.Shealsoadds33asaninputtotheregisterfilemultiplexer.
UsingWorksheet1,writemicrocodetoimplementAlyssa’snewunsignedmultiplyinstruction.
InWorksheet1,therepresentationofthenextstateisdifferentfromwhatwaspresentedinlecture.Thelasttwocolumnsrepresentsa2-bitfieldwithfourpossiblevalues:
N,J,Z,andD.IfυBrisN(next),thenthenextstateissimply(currentstate=1).IfitisJ(jump),thenthenextstateisunconditionallythestatespecifiedintheNextStatecolumn(i.e.,it’anunconditionalmicrobranch).IfitisZ(branch-if-zero),thenthenextstatedependsonthevalueoftheALU’szerooutputsignal(i.e.,it’saconditionalmicrobranch).Ifzeroisasserted(==1),thenthenextstateisthatspecifiedintheNextStatecolumn,otherwise,itis(currentstate+1).IfυBrisD(dispatch),thentheFSMlooksattheopcodeandfunctionfieldsintheIRandgoesintothecorrespondingstate.Forthisproblemset,weassumethatthedispatchgoestothelabeled(DLX-instruction-name+”0”).Forexample,iftheinstructionintheIRisSW,thenthedispatchwillgotostateSW0.
TheALUperformsoperationsspecifiedbytheALUOp,whichisdeterminedbytheALUcontrollogicblock.AssumetheALUcanperformthefollowingoperations:
HowmanycyclesdoesitfortheMULUinstructionfordifferentvaluesofRs2?
Problem1.D
WithAlyssa’snewunsignedmultiplyinstruction,Beneliminatestheinnerloopofhisoriginalfactorialcodeandsimplifiesittothefollowing.
HelpBenwriteDLXassemblycodetoimplementfactorialusingthenewMULUinstruction.Again,useR1fornandR2forresult.Attheendofyourcode,R2mustcontainthecorrectvalue.Youdonothavetopreservethevaluesofanyotherregisters.
HowmanyDLXinstructionsareexecutedtocalculateafactorial?
Howmanycyclesdoesittaketocalculateafactorial?
Again,assumeMemorywillnotassertitsbusysignal.
Problem1.E
CombiningamicrocontrollerandtheDLXbus-baseddatapath(L4-5)givesusacompleteworkingcomputerthatcanrunasubsetoftheDLXISA.
Besidesrequiringmuchmorememory,AlyssatellsBenanotherreasonwhyusingtheoriginalDLXMicrocontroller(L4-9)wasavadidea.Amachinewiththeoriginalcontrollerwouldhaveamuchlongercycletimethanamachineusingthesecondmicrocontroller(L4-15).Bencan’tunderstandwhythisistrue.
BelowarethedelaysofthehardwarepartsusedtoimplementtheDLXbus-baseddatapachandthefirstversionoftheDLXMicrocontroller(L4-9).
AssumethattALU,tROM,tMEN,havecomparablevaluesandthatthesedelaysarebiggerthanthedelaysforothercomponents.
Usingthefirstversionofthemicrocontroller,whichmicrocodeinstructioninvokesthecriticalpathofthemachine?
Describethecriticalpath.WhatistheminimumclockperiodthatthecompleteDLXbus-basedmachinecanrunat?
AssumeMemorywillnotassertitsbusysignal.
Problem2:
PipelineHacking
InspiredbyhissuccesswiththeMACCinstructioninthelastproblemset,BenBitdiddlecomesupwiththefollowingnewinstructionformatcalledBIF(forBen’sInstructionFormat)thathewantstoaddtotheDLXISA:
Thesemanticsofthenewinstructionwouldbethis:
Whereop1andop2areALUoperations.Forexample,ADDR1,R2:
ADDR3,R4wouldbecomputedasfollows:
Toimplementthenewinstructionformat,BendecidestoaddanALUtothememoryphaseofthepipelined,fully-bypassedimplementationoftheDLXdatapathdiscussedinlecture.Old-styleDLXinstructionswouldstillusetheALUintheexecutephasewhileBIFinstructionswouldusebothALUs.ThefirstALUinthememorypahse.Thenewpipelinewouldlooklikethis:
Inaddition,theregisterfileintheolddatapathisreplacedwitharegisterfilewiththreereadportsandonewriteportsothatallthreeoperandscanbereadatthesametime.
Problem2.A
GiveacodeexamplethatshowshowtoycangetbetterperformanceusingBIFinstructions.Provideboththeoriginalols-styleDLX,codeandthecodethatusestheBIFinstructions.Theoriginalcodeshouldcontainatleastsixinstructions.
WhatisthemaximumpossibleimprovementinperformanceusingBIFinstructions?
Problem2.B
Showalldatahazardsthatcancausestallsandprovideacodeexampleforeachcase.Youshouldconsiderbothold-styleDLXinstructionsandinstructionsusingBen’snewformat.Youmayassumethatthedatapathisfully-bypassed.Donotconsiderjumpsorbranchesfornow.Stillignoringjumpsandbranches,howperformancechangeifnon-BIFinstructionsusedthenewALUintheMAphaseinstasdoftheoriginalALUintheexecutephase?
Problem2.C
Writetheequationsforws,we,re1,re2forthenewdatapathwithnon-BIFinstructionsusingtheoriginalALUintheexecutephase.Writethestallsignalusingws,we,re1,andre2.Youmayneedothersignals.Thesignalsfortheoriginaldatapathareprovidedhereforyourconvenience.Again,donotconsiderjumpsorbranchesfornow.
Problem2.D
Nowconsiderjumpsandbranches.Whatadditionalhazardscanoccur?
Giveanexampleforeachcase.
Withthenewinstructionformat,Benthinksthatwecanspeedupconditionalbranchesifweallowaninstructionthatcombinesthecomparewiththebranch.
Forexample,
wouldmean:
Thefirstinstruction(inthisexample,theSLTinstruction)wouldbeperformedontheSLUintheEXpahse.Theresult(a0or1),insteadofbeingwrittentotheregisterfile,wouldbepassedtotheMAphasewherethezerotestwouldbeperformedonthenewALU.Howmanydelayslotswillthistypeofinstructionrequiretoavoidanystalls?
Howcouldyoureducethenumberofdelayslotsthatareneeded,withoutintroducinganynewstallconditionsorkillinginstructionsinthepipeline?
Foreachcasethatyouconsider,arguewhateffectitwillhaveontheclockperiod.Eachcaseshouldcorrespondtoanimplementationwithadifferentnumberofdelayslots.
Giventheoptionsyouinvestigatedabove,argueinafewsentenceswhichoftheseoptionsisthebest.Considerdelayslots,stalls,circuitsize,andclockperiod.
Problem2.E
Benfindsthattheadditionalreadportintheregisterfileisincreasingthelengthofthecriticalpathintheprocessor,andthattheycannotclockthenewdatapathatashighaspeedastheoriginal.Totryandsolvetheproblem,heisgoingtotryandusetheoriginalregisterfile.
Sincetheoriginalregisterfileonlyhastworeadports,onlytwooftheoperandscanbereadintheIDphase.ThethirdoperandisgoingtobereadintheEXphase,inparallelwiththefirstALUoperation.
Whatotherchangesareneededtomakethisschemework?
Howdothesechangesaffecttheperformanceoftheprocessor?
AlyssathinksthatBencansolvehisproblembyaddingasecondregisterfile,identicaltothefirst.Howwouldthisschemework?
HowdoestheperformanceofAlyssa’sschemecompatrtothetwothatBentried?
Problem2.F
WiththenewBIFinstructions,howwillcodesizechange?
Willthishaveaneffectonperformance?
Problem3:
CacheAccess-Time&Performance
Benistryingtodeterminethebestcacheconfigurationforanewprocessor.Heknowshowtobuildthreekindsofcaches:
direct-mappedcaches,2-wayser-associativecaches,andsmallfullyassociativecaches.Thegoalistofindthebestcacheconfigurationwiththegivenbuildingblocks.
Sinceheonlyknowshowtobuildverysmallfullyassociativecaches,Bendecidedtouseeitherdirect-mappedor2
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 多级 存储器 练习