北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx
- 文档编号:9561548
- 上传时间:2023-02-05
- 格式:DOCX
- 页数:19
- 大小:74.35KB
北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx
《北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx》由会员分享,可在线阅读,更多相关《北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx(19页珍藏版)》请在冰豆网上搜索。
北大暑期课程《回归分析报告》LinearRegressionAnalysis讲义PKU8
Class8:
polynomialregressionanddummyvariables
I.PolynomialRegression
Polynomialregressionisaminortopic.Becausethereislittlethatisnew.Whatisnewisthatyoumaywanttocreateanewvariablefromthesamedataset.
Thisisnecessaryifyouthinkthatthetrueregressionfunctionisnotlinearbutquadratic,youmightwanttotrytousethequadraticfunction,thatis,thefirstandthesecondorderregressors.
Forexample,weknowthatearningsincreasesasafunctionofage.Buttherelationshipisnotlinear.Therefore,weregressearningsonageandage2.Oneimportanttrickisthatifyouhavepolynomialregression,theregressionlineisnolongerlinearwhenyouplotthedependentvariableagainstindependentvariable.
Useahypotheticalexample,
Ifweobtain
hasalineareffect,
hasaquadraticeffect.Ifyouareaskedtoplot
against
thelineislinear.Ifyouareaskedtoplot
against
thelineisquadratic.Saythesamplemeanof
is0.5
0
1
2
3
5
8
Interpretationofcoefficientsinquadraticequations
Say
important:
thereisnosimplerelationshipbetween
and
.Sometime,theeffectof
on
ispositive,sometimestheeffectof
on
isnegative.Inotherwords,theeffectof
on
dependsonthevalueof
.
Suggestion:
plottheregressionforthedatarange.
Onethingwecantell:
When2>0,theeffectof
on
increaseswith
;
When2<0,theeffectof
on
decreaseswith
.
[figure]
II.InterpretationofCoefficientsinPolynomialRegression
RelationshipbetweenYandthe“polynomial”independentvariableisnolongerlinear.
Recallaspecialpropertyofthelinearfunction:
therelationshipbetweenYandanX(sayXk)isconstantforallvaluesofthisXandotherXvariables:
(1)
.
Inapolynomialregression,thissimplerelationshipnolongerholdstrue.Foraquadraticregression,forexample,
(2)
wehave
(3)
whichisdependentonthevalueofXk.
Ingeneral,thesituationwhereasimplelinearrelationshipofequation
(1)isnottrueiscalled“interaction”,atopictowhichwewilldevotealecture.Fornow,letusdefineinteractionasthesituationwherethe“effect”ofanindependentvariabledependsonthevalueofanothervariable.
Inpolynomialregressionsinvolvinganindependentvariableofanorderhigherthan1(i.e.,quadraticorhigher),wecaninterpretthisasanimplicitinteractionofavariablewithitself.
Example,earningsasafunctionofexperience.Ifthequadraticfunctionistrue,wecanfindavalueofexperiencewhichmaximizesearnings(whichcouldbeeitherwithinareasonablerangeexperiencedbyworkersorinarangeunlikelytobeexperiencedbyworkers).Useequation(3)toobtaintheyearthatmaximizesearnings:
Thatiswhywewouldwanttosee
and
tobeofdifferentsigns.
InXieandHannum(1996Table1,Model2),
=0.046,
=-0.000693.Optimalyearofexperienceis:
33.2years,aboutretirementage.InU.S.,itis33.8years.SeeXieandHannum(1996,p.955).
Notethatbeforethiscriticalvalue,the“effect”ofXkonYisalwayspositive,buttherateoftheincreasedeclines,upto33years.
III.DefiningDummyVariables
Adummyvariableissometimescalledan"indicatorvariable."
Itreferstothefollowinglogicalcodingschemeforadichotomousvariable:
x=1ifaparticulareventistrue
x=0otherwise.
A.ExamplesofDummyVariables
Sex(Male):
x1=0iffemale
x1=1ifmale
EmploymentStatus:
x2=0ifnotemployed
x2=1ifemployed
Povertystatus:
x3=0ifnotinpoverty,orhouseholdincome>threshold.
x3=1ifinpoverty,orhouseholdincome B.InterpretationofDummyCoefficients(Intercept? ) 1.Whenadummyvariableistheonlyindependentvariable Interpretation: interceptisgroup-specific. Example: y=Income, x=1ifmales regressyonx1: y=β0+β1x1 Aswediscussedbefore,regressionshouldbeinterpretedasconditionalmeans.Rememberinyourexercise,ifwehave1astheonlyregressor,theestimatedinterceptisidenticaltothesamplemean.Ifwehaveadummyvariableinaregression,theestimatedcoefficientrepresentsthemeandifferencebetweentwogroups. Incomeleveloffemales: β0 Incomelevelofmales: β0+β1 β1isthemeandifferenceinincomebetweenmalesandfemales. Ifwecomputethemeansbysex,wegetthesameresults. Proof: letusregroupthesamplebysex: n=n1+n2: dividethesampleintotwosamples: malesandfemales. First,regroupthedataintofemales(x1=0)andmales(x1=1).Noten1+n2=n. (overn2meaningsummationfromn1+1ton1+n2,alsodenotedby = Howaboutthestandarderrors? Theycanbedifferent(pooledversusgroup-specificestimatorof). 2.Whenadummyvariableisusedwithothercontinuousindependentvariables Twoparallellineswithdifferentintercept.Thereexistsanoveralldifferencealongtheentiredistributionrangeofthecontinuousvariables.[blackboard]drawlines. Assumption: thereisnointeraction. Example: incomeonsexandability. IV.Whenadummyvariableisusedwithanotherdummyvariable Fourparallellineswithdifferentintercepts. Assumption: nointeractioneitherbetweenthedummyvariablesandthecontinuousvariablesorbetweenthetwodummyvariables. V.ImportantDifference: DichotomousVariablesusedasindependentvariablesandasdependentvariables Independentvariable: theeffectisashift. Dependentvariable: thelinearmodelcannotbetrue. [blackboard]why? VI.TheLeastSquaresEstimationwithacontinuousvariableandadummyvariable Theleastsquaresestimationholdsupforregressionswithdummyvariables. X=|1,x1x2|,wherex1isacontinuousvariable,x2isadummyvariable. X'X: =|nxi1xi2| |xi12xi1xi2| |xi2| =|nx1in2| |x1i2x1iovern2| |n2| n2isthetotalnumberofcaseswherex2=1istrue. Allalgorithmsfortheleastsquaresestimationstillhold. Interpretationof1: thepureeffectofx1netofoverallgroupdifference.Alsocalled“within-groupaverageeffectofx1”. 1canbeestimatedin3-steppartialregressionmethod: (1)Regressyonx2,obtainresiduals==y*(whichisthedeviationofyfromthegroupmean); (2)Regressx1onx2,obtainresiduals==x1*(whichisthedeviationofx1fromthegroupmean); (3)Thenregressy*onx1*,weobtain1,whichisthepure,partialeffectofx1ony.Remembertoadjustfordegreesoffreedom(by1)duetox2. VI.NominalVariables Definition: Anominalvariableisaclassificationsystem.Noinformationaboutorderingisassumedorutilized.Numericalvaluesforanominalvariablearearbitrary,usedforclassificationoridentification. ForanominalindependentvariablewithJcategories,weuseasetofJ-1dummyvariablesinregressionanalysis. Sayavariablexhasthreecategories,weneedtouse2dummyvariables(inadditiontotheintercept): x1=1ifx=2 x1=0otherwise x2=1ifx=3 x2=0otherwise Forexample,forvariableRace: Race (2)=1ifBlack Race(3)=1ifAsian Inthiscase,Whiteistheexcludedcategory. Alternatively, Race(Black)=1ifBlack Race(Asian)=1ifAsian Dummyvariablesforanominalvariableshouldappeartogetherinthemodel(ininteractions,forexample).Theycannothaveinteractionswitheachotherbecausetheydonotoverlap. Regression: yisincome y=0+1Race(black)+2Race(Asian)+ say,b'=|20,-10,-15| MeanofWhites: 0=20 MeanofBlacks: 0+1=10 MeanofAsians: 0+2=5 IfwechangethecodingsothatBlackisusedastheexcludedcategory: y=0+1Race(white)+2Race(Asian)+ 0=10=black 1=White-black=10 2=Asian-black=-5 Interpretationofcoefficientsinacomplexmodel: Ifwehavetwosetsofdummyvariables y=0+1Race(white)+2Race(Asian)+3Sex(male)+ Whatis0? Meanincomeleveloffemaleblacks ReasonisthatexcludedcategoriesareblacksforRaceandfemalesforSex. Howdowecomputeaveragesforothergroups: AsianFemale? Whitemale? Ifwehavetwodummyvariablesandonecontinuousvariable y=0+1Race(white)+2Race(Asian)+3Sex(male)+4Ability+ 0istheincomeleveloffemaleblackswithzeroscoreofability.Itisanintercept. [blackboard]Sixparallellines.TwodummyvariablesforRacecannotoverlap.RaceandSexdooverlap.Additivityisassumedhere.WewilldiscussinteractionsonThursday. VII.TestingforCollapsibilityofCategories. WecanuseF-testsfornestedmodelstotestthecollapsibilityofcategoriesinanominalvariable. ConsideraGSSquestionaboutregionofresidenceatage16(REG16): OriginalCode Recode 1 NewEngland East 2 MiddleAtlantic 3 EastNorthCentral Midwest 4 WestNorthCentral 5 SouthAtlantic South 6 EastSouthCentral 7 WestSouthCent 8 Mountain West 9 Pacific Inregressionanalysis(saywithoccupationalprestigeasthedependentvariable),wecanuseasetof8dummyvariablesfortheoriginalcodesofthevariable.Wecanalsouseasetof3dummyvariablesafterwecollapsethecodesintoasmallersetof4broaderregions. Thetwomodelsarenested.Seeexample TheF-testbetweenthetwonestedmodelstellsuswhetherthecollapsingisjustified. F(5,550)=[(127029.555-125261.776)/5]/227.75 =[1767.779/5]/227.75=353.56/227.75=1.55,notsignificantat5%. .recodereg16x(2=1)(4=3)(6=5)(7=5)(9=8)(reg16x: 302changesmade) . .tablereg16,c(meanprestige) --------------------------reg16|mean(prestige) ----------+--------------- 1|39.59259 2|44.01123 3 | 42.0087 4 | 44.5161
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 回归分析报告 北大暑期课程回归分析报告Linear Regression Analysis讲义PKU8 北大 暑期 课程 回归 分析 报告 Linear Analysis 讲义 PKU8
链接地址:https://www.bdocx.com/doc/9561548.html