lecture6Word格式文档下载.docx
- 文档编号:17930774
- 上传时间:2022-12-12
- 格式:DOCX
- 页数:18
- 大小:205.96KB
lecture6Word格式文档下载.docx
《lecture6Word格式文档下载.docx》由会员分享,可在线阅读,更多相关《lecture6Word格式文档下载.docx(18页珍藏版)》请在冰豆网上搜索。
,
hasbeenusedtofittheleastsquaresregressionline
.
Theithresidualis
Lemma1.Undermodel(2.1),for
theresidual
followsnormaldistributionthat
and
Note:
When
islargeandX’sarewellspaced,
and
Aswealreadyknow,
Remarks:
Theresiduals
areinasensetheestimatorsoftheunobservedrandomerrorterms
StudentizedandSemi-studentizedResiduals
Let
denotetheestimatedstandarddeviation(i.e.,standarderror)oftheithresidual.ThentheithStudentizedresidualisgivenby
SuchmodifiedresidualsaresaidtobeStudentizedsincetheyareobtainedfromtheithresidualbysubtractingthemean
anddividingbyitsstandarderror.Thisproceduremimicstheprocedureusedtocomputetheteststatisticfortestingahypothesisaboutthemeanofanormalpopulationwhenthepopulationvarianceisunknown.Suchateststatistichasastudentt-distribution(withdegreesoffreedomn-1).ThemodifiedresidualdefinedabovedosenotactuallyhasaStudenttdistribution,(
andSSEarenotindependent),butitisobtainedbythesametypeoftransformationusedtoconstructrandomvariableshavingtheStudenttdistribution.Thusthename,Studentized.
Remarks
1.Onpage103ofyourtextbook,theauthordiscussessemi-studentizedresiduals.Theithsemi-studentizedresidualissimply
When
islargeandtheX’sarewellspaced,
sothat
.Thesemi-studentizedresidualsarecertainlyeasiertocompute(assumingthatyouhadtomakethecomputationyourself),butSASwillcomputetheactualStudentizedresidualsuponrequest,sowhynotgowiththe“realthing”.
2.SAScanalsocomputewhatSAScallsRstudentizedresiduals.ThedifferencebetweenanordinarystudentizedresidualandRstudentizedresidualisthatfortherstudentizedresidual
isreplacedwith
where
Rstudentizedresidualswereproposedbybelsley,KuhandWelschintheirbook,RegressionDiagnostics(weiley1980).Inthecontestofsimpleregression,theyclaimthatforeachi,rstudenthasapproximatelyaStudenttdistributionwithn-3degreesoffreedom.Rstudentresidualsaregoodfordetectingoutlierssinceanobservationwithalargeresidualtendsinflatethe
.Deletingthisobservationincomputingthe
meansthat
However,theauthorofyourtextdonotdiscussRstudentizedresiduals(theycallthemdeletedstudentnizedresiduals)untilChapter9or10,sotokeepthingssimple,wewillwaittillthentodiscussthistypeofresidual.
ResidualPlots
Insimplelinearregression(onepredictorvariable),residualsareusuallyplottedagainsttheircorresponding
valueoragainsttheircorrespondingpredictedvalue
.(Inmultipleregression,wheretherearemanydifferent
’s,residualsareusuallyplottedagainst
.)Residualsareneverplottedagainstthecorrespondingactual
becausethesetermshaveapositivecovariance,whichwouldappearasapositivetrendintheplot.Ontheotherhand
and
areuncorrelated.
(
So,
)
Iftheassumptionsofmodel(2.1)arecorrectforthedata,theresiduals(plottedontheordinate)againsteither
or
(ontheabscissa)shouldberandomlydistributedabutthehorizontalaxis.
Example:
InthevehicleweightversusMPGexample,theresidualsandthestudentizedresidualsaregivenintheoutputofthefollowingSAScode:
PROCREGDATA=Cars;
MODELmpg=weight/R;
OUTPUTout=CarsoutP=PredMPGR=ResidualStudent=Stud_ResRstudent=Rstud_Res;
RUN;
goptionsreset=globalgunit=pctborder
ftext=swissbhtitle=6htext=3
hsize=8invsize=5incback=white;
/*graphinhsymbols,theirinterpolationsandcolors*/
symbol1v=circleh=3c=red;
symbol2v=squareh=4c=green;
symbol3v=diamondh=3c=red;
run;
titlecolor=blue'
Stud_res,Rstud_resandresidualv.s.wgt'
;
PROCGPLOT;
PLOTResidual*weight=2Stud_Res*weight=1Rstud_Res*weight=3/legendoverlayVref=0;
Stud_resv.s.PredMPG'
PLOTStud_Res*Predmpg=3/Vref=0;
OutputStatistics
DependentPredictedStdErrorStdErrorStudentCook'
s
ObsVariableValueMeanPredictResidualResidualResidual-2-1012D
118.300018.09290.16960.20710.3100.668||*|0.067
215.900016.30450.1381-0.40450.325-1.243|**||0.139
316.400016.50320.1259-0.10320.330-0.313|||0.007
417.500017.29810.11710.20190.3340.605||*|0.023
515.500015.50970.2067-0.0096770.287-0.0337|||0.000
618.800018.49030.20670.30970.2871.080||**|0.303
716.800016.50320.12590.29680.3300.899||*|0.059
816.500016.90060.1124-0.40060.335-1.195|**||0.080
916.500016.10580.15290.39420.3191.237||**|0.176
1017.800018.29160.1876-0.49160.300-1.641|***||0.528
SumofResiduals0
SumofSquaredResiduals0.99961
PredictedResidualSS(PRESS)1.60514
NonlinearityoftheRegressionFunction
Iftheplotoftheresiduals(orthestudentizedresiduals)againstthepredictorvariable(orthepredictedresponsevariable,
)isnotrandomlydistributedaboutthehorizontalaxis,itcouldbeanindicationthattheregressionfunctionisnonlinear.Nonlinearityoftheregressionfunctioncanalsobeascertainedfromthescatterplot,butthescatterplotisnotalwaysaseffectiveasaresidualplot.SeeFigure3.3onpage105.AlsoseeFigure3.4(b)onpage106.
The(studentized)residualplotsinourvehicleweightvsMPGexampleshowamoreorlessrandomdispersionaboutthehorizontalaxis.Thusthelinearmodelinthisinstanceappearstobeadequate.Thisconclusionissupportedbytherelativelysmallrootmeansquareof0.35348andrelativelyhigh
NonconstancyofErrorVariance
Aplotoftheresiduals(orthestudentizedresiduals)againstthepredictorvariable
orthepredictedresponsevariable
arealsousefulinaccessingwhetherofnottheerrorvarianceisconstantasassumedinthemodel.Ifthemagnitudeoftheresidualstendstoincreaseor(lesslikely)todecreaseas
increases,thisisindicativeofasituationinwhichtheerrorvarianceischangingasthevalueoftheindependentvariable
changes.Since
islinearlyrelatedtopredictorvariable
asimilarstatementcanbemaderegardingaplotoftheresiduals(orstudentizedresiduals)against
.Systematicchangesinthemagnitudeoftheresidualsviolatetheassumptionthattheerrorshaveconstantvariance.A“wedgeshaped”residualplotasinFigure3.4(c),page106,wouldtypifythissituation.Alesslikely,butpossiblesituationisiftheerrorvarianceisdecreasingas
isincreases.Thissituationwouldresultina“reversedwedge”plot.
thestudentizedresidualplotsintheweight-MPGexampledonotexhibitanywedge-shapedpattern,whichindicatesthatthevarianceismoreorlessconstant.However,theresidualplotinFigure3.5,page107,showsatendencyforthevariabilityoftheresidualstoincreasewith
Outliers
Outliersareextremeobservations.Theycanbeidentifiedbyresidualplotsagainsteither
or
.Studentizedplotsareparticularlyhelpfulinthiscontext.Aroughruleofthumb(whennislarge)istoconsideranobservation
whosestudentizedresidual
tobeanoutlier.Actuallythisruleisratherconservative.Amoreaggressiveruleistodeclareobservation
tobeanoutlierif
andsomestatisticsrecommendedusing2.5.MorerefinedproceduresforidentifyingoutlierswillbediscussedinChapter10.
Thebigquestionis“Shouldoutliers,onceidentified,bediscarded?
”Itisalwaystemptingtodiscardoutlierssincetheytendtodestroytheleastsquarefit,particularlyinsmalltomoderatesamples.So,theresidualplotsmayimproperlysuggestalackoffitofthelinearregressionmodel,inadditiontoflaggingtheoutlier.Figure3.7,page109,clearlyillustratesthissituation.
However,inmoststatisticians’opinions,outliershouldbediscardedifandonlyif
1.Theobservationcausingtheoutlierinvolvesandatainputerror,or
2.Theobservationcausingtheoutlierinvolvesanextraneouscase.
Byanextraneouscasewemeanthattheoutlyingobservationwascollectedunderconditionssubstantiallydifferentfromthatoftheotherobservations.Unfortunately,thestatisticianmaynotalwaysbeabletoascertainwhetherornotsituation1and/or2pertain.
Theautomaticdiscardingofoutlierscanresultinoverfittingthelinearmodeltotheremainingdatapoints.Furthermore,outliersmayconveysignificantinformation,suchaswhentheoutlieristheresultofinteractionofsomeotherpredictorvariable,whichisnotincludedinthemodel.
Figure3.6,page108,showsaresidualplotwithanoutlier.Therearenooutliersinourvehicledata.Refertotheresidualandstudentizedresidualplots.
Nonindependenceoftheerrorterms
Althoughtheactualerrortermsareassumedtobe
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- lecture6