虚拟存储器基础文档格式.docx
- 文档编号:15928367
- 上传时间:2022-11-17
- 格式:DOCX
- 页数:9
- 大小:119.98KB
虚拟存储器基础文档格式.docx
《虚拟存储器基础文档格式.docx》由会员分享,可在线阅读,更多相关《虚拟存储器基础文档格式.docx(9页珍藏版)》请在冰豆网上搜索。
Studentsarestronglyencouragedtocollaborateingroupsofuptothreepeople.Agroupshouldhandinonlycopyofthesolutiontotheproblemset.Problemsetsaredueinthebeginningofclassontheduedate.Tofacilitategrading,thesolutiontoeachproblemmustbestapledseparately.Problemsetswillnotbeacceptedoncesolutionshavebeenhandedout.
Problem1:
OptimizingforCachePerformance
Inthisproblem,youwillexploreseveralcommoncompileroptimizationsforimprovingcacheperformance.
Problem1.ALoopOrdering
Considera1KBdirect-mappedcachewhereeachcachelinecontainsfourwords.Youareaskedtoconsiderthefollowingtwoloops,writteninC,whichcalculatethesumoftheentriesina64by16matrixof32-bitintegers:
ThematrixAisstoredcontiguouslyinmemory,inrow-majororder1,andisalignedtocache-lineboundaries.Youmayassumethatanyothervariablesusedareallocatedtoregisters,andthattheonlycacheactivityinvolvestheelementsofthematrix.
CalculatethenumberofcachemissesthatoccurwhenrunningloopAandwhenrunningloopB.Arethetwovaluesthesame?
Explainwhyorwhynot.
Problem1.BBlocking
BlockingwasdiscussedinlectureBasawaytooptimizecacheperformanceforcomputationsonlargematricesthatdonotfitinthecache.HereisasimpleimplementationofmatrixmultiplyforanNbyNmatrixthatdoesnotblocking:
whererisaregister.
Hereisanimplementationthatusesblocking,which5x5blocks:
Toseehowblockingworks,wewillsimulatethecachebehaviorforN=4andS=2forafully-associativewrite-allocatecachewithtwo-wordcachelinesusinganLRUreplacementpolicy.Assumethatthematricesarealignedtocachelineboundariesandthatthecomputerdidnotre-orderantloadsorstores.CompleteTable1-1andTable1-2forthetwoimplementationsofmatrixmultiply,showingtheprogressingofcachecontentsasaccessesoccur.Onlyfillinelementsinthetablewhenavaluechanges,orwhenacachehitoccurs,inwhichcaseput“HIT”inthecorrespondingentry.
Calculatethemissrateforthetwoimplementationsbasedontheentriesinthetables.
Problem1.CMoreBlocking
BenBitdiddleisimplementingthematrix-multiply-with-blockingalgorithmfrompartB,
Hehypothesizesthatheshouldveavletogetreasonableperformanceifhecangetthetwoinner-mostloopsofthealgorithmtorunwithoutanycachemissesotherthancompulsorymisses.
Ben’simplementationrunsontheUltraS{ARC-I,whichhasadirect-mapped,write-throughnon-allocatingcache.Ignoringconflictmissesforthemoment,howbigdoesthedatacachehavetobeifBenweretouseablockingfactorofs?
Assumethattheonlycacheactivityinvolvestheelementsofthematrices(i.e.thatanyothervariablesusedareallocatedtoregisters).
Usingyourresult,calculatethemaximumvalueofBthatBencouldchooseforhisimplementation.TheUltraSPARC-Ihasa16KBdatacache.
Ben’scodeoperateson3000by3000matricesof32-bitintegervalues,laidoutinmemoryinrow-majororder.Herunssomesimulationsandfindsthatforhisparticulardataplacementandcacheconfiguration,therearesomeconflictmisses,butnotsomanythatitisaseriousconcern.However,whenheteststhecodeontheIltraSPARC-Iusingthemaximumblocksizethatyoudetermine,theperformanceisdismal.AlyssasuggeststhattheproblemmayberelatedtotheTLB.
PleaseexplainhowtheTLBcanbecausingthisproblem.TheTLbontheIltraSPARC-Iisfully-associstiveandcontains64entries.Pagesare8KB.TLBmissesarehandledinsoftware.YoumayassumethatanLRUreplacementpolicyisused.Thedatacacheisvirtually-taggedandvirtually-indexed.
Problem1.DThinkDifferent
WhenAppleComputerfirststartedusingthePowerPCprocessorintheirmachines,theyheldacompany-widecontestonwhocouldwritethefasterBlockMoveroutine,alibraryroutinewhichcopiessomenumberofbytesfromonememorylocationtoanother.
Thetoptwocontestantswereamicrocoderandacomputerarchitect.TheirimplementationsofBlockMovewerebefarthefastestinthecompany.BenBitdiddlewasinterningatAppleandcaughtaglimpseoftheircode,andnoticedthatbothroutinesusethedobzPowerPCassemblyinstruction,whichexecutesinonecycle,andhasthefollowingsemantics:
Iftheblockcontainingthebyteaddressableby(rA)+(rB)isinthedatacache,allbytesoftheblockareclearedtozero.
Iftheblockcontainingthebyteaddressedby(rA)+(rB)isnotinthedatacache,theblockisallocatedinthedatacachewithoutfetchingtheblockfrommainmemory,andallvytesoftheblockaresettozero.
Benknowsthatallthe6.823studentshavecompletedproblemset2andarenowexperiencedmicrocodersandarchitects.HewantstoclimbtheranksatA
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 虚拟 存储器 基础