Stat209 Computing Lab 1Multiple Regression Basics1.docx
- 文档编号:27530961
- 上传时间:2023-07-02
- 格式:DOCX
- 页数:20
- 大小:21.43KB
Stat209 Computing Lab 1Multiple Regression Basics1.docx
《Stat209 Computing Lab 1Multiple Regression Basics1.docx》由会员分享,可在线阅读,更多相关《Stat209 Computing Lab 1Multiple Regression Basics1.docx(20页珍藏版)》请在冰豆网上搜索。
Stat209ComputingLab1MultipleRegressionBasics1
Stat209ComputingLab1MultipleRegressionBasics1/10/11
STAT209DRogosa
ComputationalLab1.SomeMultipleRegressionBasics
note:
theformatofthisintrolabischatter,R-commandandoutputinterspersedtogether.
soyoucanreadthrough(orreadasection)andthentryitonyourown,just
bycuttingandpastingcommands.Trustthattherearemoreelegantwaysofdoing
anyofthesetasks,buttheintentisbasicinstruction,notshowingoffR
Thislabcoverscoursecontentweeks1-3
actualRcommandsbeginwith>
#designatesacommentintheR-session
DatafromVerzaniUsingRPackage
babiesMothersandtheirbabiesdata
Description
AcollectionofvariablestakenforeachnewmotherinaChildandHealthDevelopmentStudy.
Usagedata(babies)
FormatAdataframewith1,236observationsonthefollowing23variables.
Variablesindatafile
ididentificationnumber
pluralty5=singlefetus
outcome1=livebirththatsurvivedatleast28days
datebirthdatewhere1096=January1,1961
gestationlengthofgestationindays
sexinfant�ssex1=male2=female9=unknown
wtbirthweightinounces(999unknown)
paritytotalnumberofpreviouspregnanciesincludingfetaldeathsandstillbirths,99=unknown
racemother�srace0-5=white6=mex7=black8=asian9=mixed99=unknown
agemother�sageinyearsatterminationofpregnancy,99=unknown
edmother�seducation0=lessthan8thgrade,1=8th-12thgrade-didnotgraduate,2=HS
graduate�nootherschooling,3=HS+trade,4=HS+somecollege5=Collegegraduate,6&7TradeschoolHSunclear,9=unknown
htmother�sheightininchestothelastcompletedinch99=unknown
wt1motherprepregnancywtinpounds,999=unknown
dracefather�srace,codingsameasmother�srace.
dagefather�sage,codingsameasmother�sage.
dedfather�seducation,codingsameasmother�seducation.
dhtfather�sheight,codingsameasformother�sheight
dwtfather�sweightcodingsameasformother�sweight
marital1=married,2=legallyseparated,3=divorced,4=widowed,5=nevermarried
incfamilyyearlyincomein$2500increments0=under2500,1=2500-4999,...,8=12,500-14,999,9=15000+,98=unknown,99=notasked
smokedoesmothersmoke?
0=never,1=smokesnow,2=untilcurrentpregnancy,3=oncedid,notnow,9=unknown
timeIfmotherquit,howlongago?
0=neversmoked,1=stillsmokes,2=duringcurrentpreg,3=within1yr,4=1to2yearsago,5=2to3yrago,6=3to4yrsago,7=5to9yrsago,8=10+yrsago,9=quitanddon�tknow,98=unknown,99=notasked
numbernumberofcigssmokedperdayforpastandcurrentsmokers0=never,1=1-4,2=5-9,3=10-14,4=15-19,5=20-29,6=30-39,7=40-60,8=60+,9=smokebutdon�tknow,98=unknown,99=notasked
Source
Thisdatasetisfoundfromhttp:
//www.stat.berkeley.edu/users/statlabs/labs.html.
ItaccompaniestheexcellenttextStatLabs:
MathematicalStatisticsthroughApplications
Springer-Verlag(2001)byDeborahNolanandTerrySpeed.
Wehavethedatain
http:
//www-stat.stanford.edu/~rag/stat141/exs/babies.dat
TASK1:
MultipleRegression
Preliminary:
Readinbabiesdatasetandgetcleansubsetofvariables,postedatStat141site
>babies=read.table("http:
//www-stat.stanford.edu/~rag/stat141/exs/babies.dat",header=T)
>#readinfullbabiesdataset
>summary(babies)
idpluraltyoutcomedategestationsexwt
Min.:
15Min.:
5Min.:
1Min.:
1350Min.:
148.0Min.:
1Min.:
55.0
1stQu.:
52861stQu.:
51stQu.:
11stQu.:
14441stQu.:
272.01stQu.:
11stQu.:
108.8
Median:
6730Median:
5Median:
1Median:
1540Median:
280.0Median:
1Median:
120.0
Mean:
6001Mean:
5Mean:
1Mean:
1536Mean:
286.9Mean:
1Mean:
119.6
3rdQu.:
75833rdQu.:
53rdQu.:
13rdQu.:
16273rdQu.:
288.03rdQu.:
13rdQu.:
131.0
Max.:
9263Max.:
5Max.:
1Max.:
1714Max.:
999.0Max.:
1Max.:
176.0
parityraceageedhtwt1
Min.:
0.000Min.:
0.000Min.:
15.00Min.:
0.000Min.:
53.00Min.:
87.0
1stQu.:
0.0001stQu.:
0.0001stQu.:
23.001stQu.:
2.0001stQu.:
62.001stQu.:
115.0
Median:
1.000Median:
3.000Median:
26.00Median:
2.000Median:
64.00Median:
126.0
Mean:
1.932Mean:
3.206Mean:
27.37Mean:
2.922Mean:
64.67Mean:
154.0
3rdQu.:
3.0003rdQu.:
7.0003rdQu.:
31.003rdQu.:
4.0003rdQu.:
66.003rdQu.:
140.0
Max.:
13.000Max.:
99.000Max.:
99.00Max.:
9.000Max.:
99.00Max.:
999.0
dracedagededdhtdwtmarital
Min.:
0.000Min.:
18.00Min.:
0.000Min.:
60.00Min.:
110.0Min.:
0.000
1stQu.:
0.0001stQu.:
25.001stQu.:
2.0001stQu.:
70.001stQu.:
165.01stQu.:
1.000
Median:
3.000Median:
29.00Median:
4.000Median:
73.00Median:
190.0Median:
1.000
Mean:
3.665Mean:
30.74Mean:
3.189Mean:
81.67Mean:
505.4Mean:
1.038
3rdQu.:
7.0003rdQu.:
35.003rdQu.:
5.0003rdQu.:
99.003rdQu.:
999.03rdQu.:
1.000
Max.:
99.000Max.:
99.00Max.:
9.000Max.:
99.00Max.:
999.0Max.:
5.000
incsmoketimenumber
Min.:
0.00Min.:
0.0000Min.:
0.000Min.:
0.000
1stQu.:
2.001stQu.:
0.00001stQu.:
0.0001stQu.:
0.000
Median:
4.00Median:
1.0000Median:
1.000Median:
1.000
Mean:
13.16Mean:
0.8681Mean:
1.748Mean:
2.604
3rdQu.:
7.003rdQu.:
1.00003rdQu.:
1.0003rdQu.:
3.000
Max.:
98.00Max.:
9.0000Max.:
99.000Max.:
98.000
>dim(babies)#dimtellsyou1236subjects,23vars
[1]123623
>#butwehavelotsofmissingdata.Becausethisisa"basics"exercise,usecaseswithcompletedata
>#sincewehavemissingdata,forthepurposesofthisLabwewant
>#tocleanthosecasesoutanduseadatasetwithcompletedata
>#varswewanttouse
>#gestationlengthofgestationindays
>#wtbirthweightinounces,999unknown
>#agemother�sageinyearsatterminationofpregnancy,99=unknown
>#htmother�sheightininchestothelastcompletedinch,99=unknown
>#wt1motherpre-pregnancyweightinpounds,999=unknown
>#dhtfather'sheight,codingsameasformother'sheight
>#dwtfather'sweightcodingsameasformother'sweight
>
>#createanewdataset,asubsetofthefulldata
>subsetBabies=subset(babies,subset=gestation<999&wt1<999&wt<999&ht<99&dwt<999&dht<99&age<99,select=c(gestation,wt,age,ht,wt1,dht,dwt))
>dim(subsetBabies)
[1]7057
>#soonly705ofthecaseshavecompletedataonthevariablesofinterest,birthweightasoutcomeand6predictors
#andnonehavemissingdatacodesasseenbelow
>summary(subsetBabies)
gestationwtagehtwt1dht
Min.:
148.0Min.:
55.0Min.:
15.00Min.:
54.00Min.:
87.0Min.:
60.00
1stQu.:
272.01stQu.:
108.01stQu.:
23.001stQu.:
62.001stQu.:
115.01stQu.:
68.00
Median:
280.0Median:
120.0Median:
26.00Median:
64.00Median:
125.0Median:
71.00
Mean:
279.2Mean:
119.5Mean:
27.39Mean:
64.08Mean:
128.8Mean:
70.26
3rdQu.:
288.03rdQu.:
131.03rdQu.:
31.003rdQu.:
66.003rdQu.:
140.03rdQu.:
72.00
Max.:
353.0Max.:
176.0Max.:
43.00Max.:
72.00Max.:
250.0Max.:
78.00
dwt
Min.:
110.0
1stQu.:
155.0
Median:
170.0
Mean:
171.2
3rdQu.:
185.0
Max.:
260.0
#togetthe7x7arrayofscatterplotsusecommand
pairs(subsetBabies)
#seeMBtext3rdedp.175Irelandracedata,2ndedp.178for3x3example,micedata
>attach(subsetBabies)#sowecanreferdirectlytovariablenamesindataset
>names(subsetBabies)
[1]"gestation""wt""age""ht""wt1""dht""dwt"
#wtwillbetheoutcomevariableintheregressiondemonstrations
>#justtogetafeelforthedata,5-numbersummaryforwt,andcorrelationsamongallvars
>quantile(wt)
0%25%50%75%100%
55108120131176
>length(wt)#numberofcases
[1]705
>cor(subsetBabies)#correlationmatrix
gestationwtagehtwt1dhtdwt
gestation1.000000000.39500543-0.0596116930.025813150.063867760.011915940.010700934
wt0.395005431.000000000.0477945710.216145250.162493540.098221570.141033572
age-0.059611690.047794571.0000000000.018526810.19244242-0.056941640.003239663
ht0.025813150.216145250.0185268131.000000000.422336620.332888880.258235062
wt10.063867760.162493540.1924424160.422336621.000000000.131497880.197935408
dht0.011915940.09822157-0.0569416420.332888880.131497881.000000000.542304515
dwt0.010700930.141033570.0032396630.258235060.197935410.542304521.000000000
>
>
>#MultipleRegressionusingthedata
>
>fullreg=lm(wt~gestation+age+ht+wt1+dht+dwt,data=subsetBabies)
>summary(fullreg)
Call:
lm(formula=wt~gestation+age+ht+wt1+dht+dwt,data=subsetBabies)
Residuals:
Min1QMedian3QMax
-48.6889-10.63680.220710.163354.7421
Coefficients:
EstimateStd.ErrortvaluePr(>|t|)
(Intercept)-101.0010622.93807-4.4031.23e-05***
gestation0.450770.0389311.578<2e-16***
age0.184640.107061.7250.0850.
ht1.236050.284064.3511.56e-05***
wt10.033570.033960.9890.3232
dht-0.098820.26615-0.3710.7105
dwt0.076190.032962.3120.0211*
---
Signif.codes:
0'***'0.001'**'0.01'*'0.05'.'0.1''1
Residualstandarderror:
16.42on698degreesoffreedom
MultipleR-Squared:
0.2118,AdjustedR-squared:
0.2051
F-statistic:
31.27on6and698DF,p-value:
<2.2e-16
>#Rsqnothigh,2highlysigcoeff,gestation,momht,alsosignifdadwtpredictingbabywt
#soitseemsimportantpredictorsaremomanddadsizeandbabybeing(near-to)fullterm
-------------------------------
OlderBusiness,Week1content
Task2:
useadjustedvariablesinterpretationtoreproducegestationcoefficient
inunstandardizedregression
>#doadjustedvariablesstuff(herewealsoreamedouttheoutcomevariablewt,
getsameregrcoeffusingwtorwtadjustedforotherpreds)
>wtdotnogest=lm(wt~age+ht+wt1+dht+dwt,data=subsetBabies)
>gestdotpreds=lm(gestation~age+ht+wt1+dht+dwt,data=sub
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- Stat209 Computing Lab Multiple Regression Basics
链接地址:https://www.bdocx.com/doc/27530961.html