project.docx
- 文档编号:8288083
- 上传时间:2023-01-30
- 格式:DOCX
- 页数:15
- 大小:318.84KB
project.docx
《project.docx》由会员分享,可在线阅读,更多相关《project.docx(15页珍藏版)》请在冰豆网上搜索。
project
project
;Exercise1
;ThisisanassemblyversionofthefollowingCcode(assuminga,bandcalreadydeclared)
;
;for(inti=0;i<6;i++){
;a[i]=a[i]+b[i]+c[i];
;}
.data
a:
.space48
b:
.word10,11,12,13,0,1
c:
.word1,2,3,4,5,6
.text
;initializeregisters
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,6
Loop:
lwr5,0(r1);elementofa
lwr6,0(r2);elementofb
lwr7,0(r3);elementofc
daddr8,r5,r6;a[i]+b[i]
daddr9,r7,r8;a[i]=a[i]+b[i]+c[i];
swr9,0(r1);storevalueina[i]
daddir1,r1,8;incrementmemorypointers
daddir2,r2,8
daddir3,r3,8
daddir4,r4,-1;i++
bnezr4,Loop
end:
halt
1)Loadex1.sintothememoryofMIPS64anddisabletheforwardinglogic,thedelayslotandtheBranchtargetbufferfromtheConfiguremenuinthemaintoolbar.Beforerunningtheprogram,trytopredictwherestalls
occur,howmanyclockcyclestheywilltake,andforwhatkindofhazardstheyoccur.Thencompareyour
predictionwiththesimulationresults.
答:
时钟周期数=19×6+4+4=122
RAWdatahazard=7×6=42次。
仿真器的模拟结果为:
2)Theprogramofex1.sconsistsaloopplussomeotherinstructionsoutsideit.Afterrunningtheprogramtocompletion,estimatetheCPIusingtheStatisticswindow.Inthecaseofaprogramcontaininga“hotspot”(i.e.aninternalloopwhoseinstructionsareexecutedmuchmorefrequentlythanalltheotherinstructions)
theCPIcanberoughlyestimatedjustusingtheasymptoticCPI,i.e.
whereNoutandSoutarethenumberofinstructionsandthenumberofstallsoutsidethe“hotspot”,respectively,Listhenumberofloopcyclesoftheinnermostloop,andIChotandShotarethenumberofinstructionsandthenumberofstallsofthe“hotspot”.ComparetheasymptoticCPIwiththevalueresultingfromsimulations.Areresultscompatible?
答:
SimulatorCPI=((11+7)*6+4+5+5+1)/(11*6)=1.848
CPIAsymptotic=(11+7)/11=1.636
执行情况如下:
3)Enabletheforwardinglogicandexecutethecodeagain.ComputetheCPIagain.JustifytheremainingstallsandcommentwhysomeofthemoccurafterIDstageratherthanafterIF.
答:
SimulatorCPI=((11+1)*6+5+5+4)/(11*6)=1.303
CPIAsymptotic=(11+1)/11=1.091
不相同。
因为存在forwarding,
ID阶段可以先读取寄存器的地址,默认的寄存器的值为错,bnez指令需要放回寄存器中的值,所以不接受daddi指令。
EXE阶段forwarding的值,而要等到WB后的值。
4)DisabletheforwardinglogicandassumethattheMIPShardwarecannotdetecthazards.ModifythesourcecodebyinsertingNOPswhereappropriatewithoutreorderingthecode(NOPstuffingtechnique).Checkwiththesimulatorthatnostalloccurs,andcheckwhethertheCPIhaschanged.Dowehavebetterperformance?
答:
加入NOP:
.data
a:
.space48
b:
.word10,11,12,13,0,1
c:
.word1,2,3,4,5,6
.text
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,6
Loop:
lwr5,0(r1)
lwr6,0(r2)
lwr7,0(r3)
NOP
daddr8,r5,r6
NOP
NOP
daddr9,r7,r8
NOP
NOP
swr9,0(r1)
daddir1,r1,8
daddir2,r2,8
daddir3,r3,8
daddir4,r4,-1
NOP
NOP
bnezr4,Loop
end:
halt
加入了nop后,没有stall,CPI改变,性能变弱。
5)Rescheduletheinstructions(codemovingtechnique)inordertoavoidstallswithoutmodifyingtheprogramsemantics(checkthefinalresulttoseeifaftermovingtheinstructionstheresultisthesame).RecomputethenormalandasymptoticCPIvalues.
答:
代码如下:
执行情况:
.text
;initializeregisters
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,6
Loop:
lwr5,0(r1)
lwr6,0(r2)
lwr7,0(r3)
daddir2,r2,8
daddr8,r5,r6
daddir3,r3,8
daddr9,r7,r8
daddir4,r4,-1
swr9,0(r1)
daddir1,r1,8
bnezr4,Loop
end:
halt
故CPIAsymptotic=(11+3)/11=1.273
实际为1.296
6)Combinereschedulingandforwardingtechniquesandnotethedifferenceswithrespecttothe
forwarding‐onlyandrescheduling‐onlycases.Trytoenablethe“Branchtargetbuffer”lookatthesimulationcodeanddeterminetheCPI.Hasperformanceimproved?
Trytomodifytheoriginalcodebyadding6additionalinputvaluesinaandb.WhatdoyouexpectfromCPI?
答:
加入forwarding的执行情况:
在此基础上加入“Branchtargetbuffer”,得到的结果如下:
forwarding:
rescheduling:
把循环的次数增加到12次的时,增加输入的个数,CPI又会有提高。
代码如下:
.data
a:
.space96
b:
.word10,11,12,13,0,1,1,0,13,12,11,10
c:
.word1,2,3,4,5,6,6,5,4,3,2,1
.text
;initializeregisters
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,12
Loop:
lwr5,0(r1)
lwr6,0(r2)
lwr7,0(r3)
daddir2,r2,8
daddr8,r5,r6
daddir3,r3,8
daddr9,r7,r8
daddir4,r4,-1
swr9,0(r1)
daddir1,r1,8
bnezr4,Loop
end:
halt
程序的执行情况如下:
rescheduling:
增加循环次数后,代码变为:
.data
a:
.space96
b:
.word10,11,12,13,0,1,1,0,13,12,11,10
c:
.word1,2,3,4,5,6,6,5,4,3,2,1
.text
;initializeregisters
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,12
Loop:
lwr5,0(r1);elementofa
lwr6,0(r2);elementofb
lwr7,0(r3);elementofc
daddr8,r5,r6;a[i]+b[i]
daddr9,r7,r8;a[i]=a[i]+b[i]+c[i];
swr9,0(r1);storevalueina[i]
daddir1,r1,8;incrementmemorypointers
daddir2,r2,8
daddir3,r3,8
daddir4,r4,-1;i++
bnezr4,Loop
end:
halt
forwarding:
7)Awell‐knowncompileroptimizationisknownas“loopunrolling”.Basically,loopunrollingistheexplicit
repetitionoftheloopcodeanumberoftimes.Inthiswayweobtainalongerloopbodythatisexecutedless
times.Considertheoriginalcodeofex1.s.Unrollthelooptwicewithoutanycodemoving,i.e.justrepeatthe
firstfourloopinstructionsandmakethenecessarychangestherein.CalculatetheCPIforthecasewithout
forwarding.Isthereanyimprovement?
答:
代码如下:
.data
a:
.space48
b:
.word10,11,12,13,0,1
c:
.word1,2,3,4,5,6
.text
;initializeregisters
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,6
Loop:
lwr5,0(r1);elementofa
lwr6,0(r2);elementofb
lwr7,0(r3);elementofc
daddr8,r5,r6;a[i]+b[i]
daddr9,r7,r8;a[i]=a[i]+b[i]+c[i];
swr9,0(r1);storevalueina[i]
lwr10,8(r1);elementofa
lwr11,8(r2);elementofb
lwr12,8(r3);elementofc
daddr13,r10,r11;a[i]+b[i]
daddr14,r12,r13;a[i]=a[i]+b[i]+c[i];
swr14,8(r1);storevalueina[i]
daddir1,r1,16;incrementmemorypointers
daddir2,r2,16
daddir3,r3,16
daddir4,r4,-2;i++
bnezr4,Loop
end:
halt
执行情况如下:
CPIAsymptotic=(17+13)/17=1.765
故无提高。
8)ApplycodereschedulingtothesolutionofthepreviousquestionandcalculateboththeCPIandtheasymptoticCPIvalueswithandwithoutforwarding.Isthereanyimprovement?
答:
代码如下:
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,6
lwr5,0(r1)
lwr6,0(r2)
lwr7,0(r3)
lwr10,8(r1)
daddr8,r5,r6
Lwr11,8(r2);
Lwr12,8(r3);
daddr9,r7,r8
daddir4,r4,-2
daddr13,r10,r11
swr9,0(r1)
daddir1,r1,16
daddr14,r12,r13
daddir2,r2,16
daddir3,r3,16
swr14,-8(r1)
bnezr4,Loop
End:
halt
执行情况如下:
CPINormal=((17+0)*6+5+0+4)/(17*6)=1.088
CPIAsymptotic=(17+0)/17=1.000
9)Supposethattheaddoperationintheoriginalcodeisafloatingpointcalculationandtheloopisiteratedfor12
times.Pleaseusefloatingpointregistersfora[i],b[i],andc[i],andmodifyyourassemblycode.Pleaseanswer
thefollowingquestions:
Atleasthowmanytimesdoyouneedtounrollthelooptominimizestallswithout
forwarding?
Whatistheaveragelatencyofiterationsfortheoriginalloop?
Whatisthecodesize?
Pleaseshow
usyourcode.
Thefollowingistheinputdataofyourcode:
.data
a:
.space96
b:
.word10,11,12,13,0,1,1,0,13,12,11,10
c:
.word1,2,3,4,5,6,6,5,4,3,2,1
.text…………
答:
代码如下:
执行情况如下:
.data
a:
.space96
b:
.word10,11,12,13,0,1,1,0,13,12,11,10
c:
.word1,2,3,4,5,6,6,5,4,3,2,1
.text
;initializeregisters
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,12
Loop:
l.df1,0(r1)
l.df2,0(r2)
l.df3,0(r3)
add.df4,f2,f1
add.df5,f3,f4
s.df5,0(r1)
daddir1,r1,8
daddir2,r2,8
daddir3,r3,8
daddir4,r4,-1
bnezr4,Loop
end:
halt
将上面的程序四次展开:
程序如下:
执行情况如下:
.data
a:
.space96
b:
.double10,11,12,13,0,1,1,0,13,12,11,10
c:
.double1,2,3,4,5,6,6,5,4,3,2,1
.text
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,12
Loop:
l.df5,0(r1)
l.df6,0(r2)
l.df10,8(r1)
l.df11,8(r2)
add.df8,f5,f6
l.df15,16(r1)
l.df16,16(r2)
add.df13,f10,f11
l.df7,0(r3)
l.df12,8(r3)
add.df18,f15,f16
l.df17,16(r3)
add.df9,f7,f8
add.df14,f13,f12
daddir1,r1,24
add.df19,f17,f18
daddir4,r4,-3
s.df9,0(r1)
s.df14,8(r1)
daddir3,r3,24
daddir2,r2,24
s.df19,-8(r1)
bnezr4,Loop
end:
halt
WelcomeTo
Download!
!
!
欢迎您的下载,资料仅供参考!
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- project