Generalized Linear Models.docx
- 文档编号:28008113
- 上传时间:2023-07-07
- 格式:DOCX
- 页数:16
- 大小:271.46KB
Generalized Linear Models.docx
《Generalized Linear Models.docx》由会员分享,可在线阅读,更多相关《Generalized Linear Models.docx(16页珍藏版)》请在冰豆网上搜索。
GeneralizedLinearModels
Page36
GeneralizedLinearModels.(foramuchmoredetaileddiscussionrefertoAgresti’stext(CategoricalDataanalysis,SecondEdition)Chapter4,particularlypages115-118,125-132.
Generalizedlinearmodels(GLMs)are“abroadclassofmodelsthatincludeordinaryregressionandanalysisofvarianceforcontinuousresponsevariables,aswellasforcategoricalresponsevariables”.TherearethreecomponentsthatarecommontoallGLMs:
a
∙Randomcomponent
∙SystematicComponent
∙LinkFunction
RandomComponent:
Therandomcomponent:
referstotheprobabilitydistributionoftheresponseY.WeobserveindependentrandomvariablesY1,Y2,...,YN.Wenowlookatthree‘randomcomponentsexamples.
Example1.(Y1,Y2,...,YN)mightbenormal.Inthiscase,wewouldsaytherandomcomponentisthenormaldistribution.Thiscomponentleadstoordinaryregressionandanalysisofvariancemodels.
Example2.IftheobservationsareBernoullirandomvariables(whichhavevalues0or1),thenwewouldsaythelinkfunctionisthebinomialdistribution..Whentherandomcomponentisthebinomialdistribution,wearecommonlyconcernedwithlogisticregressionmodelsorprobitmodels..
Example3.QuiteoftentherandomvariablesY1,Y2,...,YNhaveaPoissondistribution.ThenwewillbeinvolvedwithPoissonregressionmodelsorloglinearmodels.
SystematicComponent.
TherandomvariablesYi,I=1,2,...,N,haveexpectedvaluesµi,I=1,2,...,N.Thesystematiccomponentinvolvestheexplanatoryvariablesx1,x2,···,xk.aslinearpredictors:
0+1x1+2x2+···+kxk.
Page37
LinkFunction.
ThethirdcomponentofaGLMisthelinkbetweentherandomandsystematiccomponents.Itsayshowthemeanµ=E(Y)relatestotheexplanatoryvariablesinthelinearpredictorthroughspecifyingafunctiong(µ):
g(µ)=0+1x1+2x2+···+kxk.
g(µ)iscalledthelinkfunction.Herearesomeexamples:
Example1.Thelogisticregressionmodelsays
ln[(x1,x2,···,xk)/1-(x1,x2,···,xk)]=0+1x1+2x2+···+kxk.
TheobservationsY1,Y2,...,YNhaveabinomialdistribution(therandomcomponent).
Thus,forlogisticregression,thelinkfunctionisln[µ/(1-µ)]andiscalledthelogitlink.
Thereareotherlinkfunctionsusedwhentherandomcomponentisbinomial.Forexample,thenormit/probitmodelhasthebinomialdistributionastherandomcomponentandlinkfunction
g(µ)=-1(µ),where(x)isthecumulativenormaldistribution.
Thereisalsoa‘Gompit’/complimentarylog-loglink(availableinMinitabwiththeprobitlinkalso)
Example2.Forordinarylinearregression,weassumetheobservationshaveanormaldistribution(therandomcomponent)andthemeanis
µ(0+1x1+2x2+···+kxk)=0+1x1+2x2+···+kxk.
Inthiscasethelinkfunctionistheidentity:
g(µ)=µ.
Example3.IfweassumetheobservationsY1,Y2,...,YNhaveaPoissondistribution(therandomcomponent)andthelinkfunctionisg(µ)=lnµ,thenwehavethePoissonregressionmodel:
lnµ(0+1x1+2x2+···+kxk)=0+1x1+2x2+···+kxk.
Page38
SometimestheidentitylinkfunctionisusedinPoissonregression,sothat
µ(0+1x1+2x2+···+kxk)=0+1x1+2x2+···+kxk.
ThismodelisthesameasthatusedinordinaryregressionexceptthattherandomcomponentisthePoissondistribution.
Thereareotherrandomcomponentsandlinkfunctionsusedingeneralizedlinearmodels.Theprobitmodelhasthebinomialdistributionastherandomcomponentandlinkfunction
g(µ)=-1(µ),where(x)isthecumulativenormaldistribution.
Insomedisciplines,thenegativebinomialdistributionhasbeentherandomcomponent.
HereisacomparisonofthecumulativeLogisticandNormaldistributions:
RowxCumNormalLogistic
1-3.00.0013500.047426
2-2.90.0018660.052154
3-2.80.0025550.057324
4-2.70.0034670.062973
5-2.60.0046610.069138
6-2.50.0062100.075858
7-2.40.0081980.083173
8-2.30.0107240.091123
9-2.20.0139030.099750
10-2.10.0178640.109097
11-2.00.0227500.119203
12-1.90.0287170.130108
13-1.80.0359300.141851
14-1.70.0445650.154465
15-1.60.0547990.167982
16-1.50.0668070.182426
17-1.40.0807570.197816
18-1.30.0968000.214165
19-1.20.1150700.231475
20-1.10.1356660.249740
21-1.00.1586550.268941
22-0.90.1840600.289050
23-0.80.2118550.310026
24-0.70.2419640.331812
25-0.60.2742530.354344
26-0.50.3085380.377541
27-0.40.3445780.401312
28-0.30.3820890.425557
29-0.20.4207400.450166
30-0.10.4601720.475021
310.00.5000000.500000
Page39
RegressionModelswithBinaryResponseVariables:
LogisticRegression
Acommonproblemisthatofestimatingtheprobabilityofsuccessusingapredictorvariablex.Hereisanexample.Launchtemperatures(indegreesFahrenheit)andanindicatorofO-ringfailurefor24spaceshuttlelaunchespriortothespaceshuttleChallengerdisasterin1986aregivenbelow:
x(temperature)Failurex(temperature)Failure
53yes70yes
56yes70yes
57yes72no
63no73no
66no75no
67no75yes
67no76no
67no76no
68no78no
69no79no
70no80no
70yes81no
Canwepredicttheprobabilityoffailureusingtemperature?
Let(x)=Prob(success|x)and1-(x)=Prob(failure|x).Wewanta'model'for(x).Wewillsetupa'regressionmodel'for(x).Whynotalinearregressionmodel(x)=0+1x1?
Answer:
a.Forxlargepositivelyandxlargenegatively(x)=0+1x1willeventuallybenegativeandgreaterthan1,anundesirablefeatureofamodelforprobabilities.
b.WeareworkingwithBernoullitrials.ThevarianceoftheoutcomeofaBernoullitrialis[(x)(1-(x)]=[0+1x1][1-(0+1x1)].Thevarianceofanobservationdependsonx,meaningtheassumptionofconstantvarianceisnotsatisfied.
c.Theerrorswouldbeeither0-[0+1x1]=-0-1x1or1-(0+1x1)--justtwopossiblevaluesforagivenx--violatingassumptionofnormality.
Whatshouldaregressionmodellooklike?
Page40
1.Since(x)isaprobability,itsvaluesshouldbebetween0and1.
2.FortheO-ringproblem,wewouldexpect(x)toincreasefromvaluesnear0tovaluesnear1:
astemperaturesincreasethechancesofafailureshoulddecreaseorthechancesofa'success'--noO-ringfailure--shouldincrease.
Herearesome'nice-looking''curves':
(x)=NormalDistributionFunctionLogisticDistributionfunction
Whatwearelookingatontheleft(above)isanormalcurveforprobabilities.Onthey-axisis(x)andonthex-axisisx:
givenavalueofx,theprobabilityofasuccessis(x),where(x)isthenormalcurve.Thereareother'curves'wecoulduse:
Thecurveontherightlooksalotlikethefirstone(normal),butitisactuallycalledthe'logistic'curve'.Therearemanyothercurveswecoulduse,butthesearethetwomostcommonlyusedones(byacountrymile!
).Thecurvesabovearein'standardunits'.(x)denotesthecumulativenormalcurve.Foraregressionmodelweuse(x)=(0+1x1).Theexpressionforthelogisticcurveismuchnicer:
F(x)=ex/(1+ex).Thecorrespondingregressionmodelis
(x)=F(0+1x1)=exp(0+1x1)/[1+exp(0+1x1)].
Ifthe'slope'isnegative,thecurveswouldcurvedownwardasxincreases.
Whichcurveshouldbeused?
Orbetteryet:
whichcurve(s)areusedinpractice?
Ifthenormaldistributionisused,themodeliscalledthe'probit'(or‘normit’model,whileifthelogisticcurveisuseditiscalledthe'logisticregressionmodel'.
Page41
Thelogisticmodelsaysthat
(x)=F(0+1x1)=exp(0+1x1)/[1+exp(0+1x1)]
Abitofalgebrashowsthatthismodelisequivalentto
ln[(x)/[1-(x)]=0+1x1
Acorrespondinglysimplemodelcannotbeobtainedfortheprobitmodel.
Thequantityln[(x)/[1-(x)]iscalledthelogitof(x)orthelogittransformof(x).
LogisticRegressionExample.WeillustratelogisticregressionusingtheChallengerShuttledataonO-ringfailures.Wecallsuccess'noO-ringfailure'--itiscodedas'1'intheoutput.HereistheMinitaboutput,usingStat>Regression>BinaryLogisticRegression.
BinaryLogisticRegression
LinkFunction:
Logit
ResponseInformation
VariableValueCount
failureyes7(Event)
no17
Total24
LogisticRegressionTable
Odds95%CI
PredictorCoefStDevZPRatioLowerUpper
Constant10.8755.7031.910.057
temp-0.171320.08344-2.050.0400.840.720.99
Log-Likelihood=-11.515
Testthatslopeiszero:
G=5.944,DF=1,P-Value=0.015
FittedModel:
Prob(failure|temp)=
.
Fittedprobabilitieswithy=1denoting‘failure’.aregivenonthenextpage.
Page42
RowTempyProb
15310.857583
25610.782688
35710.752144
46300.520528
56600.393696
66700.353629
76700.353629
86700.353629
96800.315518
106900.279737
117000.246552
127010.246552
137010.246552
147010.246552
157200.188509
167300.163687
177500.121993
187510.121993
197600.104799
207600.104799
217800.076729
227900.065438
238000.055709
248100.047353
PoissonandOrdinaryRegressionof‘NumberofArgumentsonYearsMarried’
SupposewewantedtomodelthenumberYofargumentsmarriedcoupleshaveasafunctionofthenumberofyearstheyhavebeenmarried.60couples,with3marriedxyears,x=1,2,…,20,arerandomlyobtainedandaskedhowmanyargumentstheyhadinthepastyear(theyanswerhonestly).Asummarybyyearisgivenbelowwithoutputon1)alinearregression,2)aquadraticregression,3)aquadraticregressionusingthesquarerootofY,andaPoissonregression.
DataDisplay
yrysumaverx
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- Generalized Linear Models