书签分享收藏举报版权申诉 / 23

立即下载加入VIP,免费下载

当前位置：首页 > 医药卫生 > 中医中药 > 多元线性回归multiple linear regressionWord格式.docx

多元线性回归multiple linear regressionWord格式.docx

文档编号：19192050
上传时间：2023-01-04
格式：DOCX
页数：23
大小：23.89KB

《多元线性回归multiple linear regressionWord格式.docx》由会员分享，可在线阅读，更多相关《多元线性回归multiple linear regressionWord格式.docx（23页珍藏版）》请在冰豆网上搜索。

多元线性回归multiple linear regressionWord格式.docx

noise"

variable,whichisanormaldistributionwithameanvalueof0andastandarddeviationofdelta（wedon'

tknowitsvalue）

Randomvariable.Wedon'

tknowthevaluesofthesecoefficients,P,beta,beta,...,10.Weestimateallofthesefromthedataobtained

（p+2）thevalueofanunknownparameter.

ThesedataincludetheNlineobservationpoints,alsoknownasinstances,whicharerepresentedasinstances;

.through

Theseestimatesofthebetacoefficientsarecalculatedbyminimizingthevarianceandminimumofthevaluesbetweenthepredictedandobserveddata.Variancesum

Isexpressedasfollows:

Ipiiixxxy,...,,21ni,...,2,1=

Sigma=

....

Nippiiixxxy1222110）（beta,beta,beta）

Letusrepresentthevalueofthecoefficientsbymakingtheuppertypeminimized.Theseareourestimatesoftheunknownvalues,'

'

2'

1'

0,...,,P,beta,beta,beta

TheestimatorisalsoreferredtointheliteratureasOLS（ordinaryleastsquares）.Oncewehavecalculatedtheseestimates,

Wecanusethefollowingformulatocomputeunbiasedestimates:

^^

1^

0,-,P,beta,beta2,Delta2^,Delta

Observationpointfactor

Residualsand==..

.

==Sigma

Niippiiixxxypn12221102^

（）...

11,beta,beta,betaDelta

Thevaluesweinsertinthelinearregressionmodel

（1）arebasedonthevaluesoftheknownindependentvariables

Predictthevalueofdependentvariable.Thepredictorvariablesarecalculatedaccordingtothefollowingformula:

0,...,P,beta,beta,pxxx,...,21^Y

PpxxxY^

2^

21^

0^

Betabetabetabeta++++=...

Inthesensethattheyareunbiasedestimates（themeanistrue）andthatthereisaminimumvariancecomparedwithotherbiasedestimates,

Thepredictionsbasedonthisformulaarethebestpossiblepredictivevaluesifwemakethefollowingassumptions:

1.Linearhypothesis:

theexpectedvalueofdependentvariableisalinearequationabouttheindependentvariable

PppxxxxxxYEbetabetabetabeta++++=...）,|（2211021,...

2,independencehypothesis:

randomnoisevariableIepsilon

Independentinalllines.HereIepsilon

ThenoiseisobservedatthefirstIobservationpoint

Machinevariable,i=1,2,...N;

3.Unbiasedhypothesis:

noisestochasticvariableIepsilon

Theexpectedvalueis0,thatis,fori=1,2,...Nhas0）（=iEepsilon）;

4,thesamevariancehypothesis:

fori=1,2,...Andn'

sIepsilon

Thestandarddeviationhasthesamevalueasdelta;

5.Normalityhypothesis:

Normaldistribution.

Thereisanimportantandinterestingfactforourpurpose,thatis,evenifwegiveupthehypothesisofnormality

Set5）andallownoisevariablestoobeyarbitrarydistributions,andtheseestimatesarestillwellpredicted.WecanwatchBenQ

Thepredictionoftheseestimatorsisthebestlinearpredictorduetotheirminimumexpectedvariance.Inotherwords,inalllinearmodels

And,asdefinedinequation

（1）,themodelusesaleastsquaresestimator,

0,-,P,beta,beta

Wewillgivetheminimumofthemeansquare.Anddescribetheideaindetailinthenextsection.

Normaldistributionassumptionsareusedtoderiveconfidenceintervalsforpredictions.Indataminingapplications,wehavetwodifferentdatasets:

Thetrainingdatasetandthevalidationdataset,thesetwodatasetscontaintypicalrelationshipsbetweenindependentvariablesanddependentvariables.Trainingdata

Setsareusedtoestimateregressioncoefficients.Validationdatasetsareusedtoformretentionsampleswithoutcalculatingregressioncoefficients

Estimatedvalue.Thisallowsustoestimatetheerrorsinourpredictionswithoutassumingthatthenoisevariablesarenormallydistributed

Poor.Weusetrainingdatatofitthemodelandestimatethecoefficients.Theseestimatedcoefficientsareusedforallvalidationdatasets

Examplesmakepredictions.Comparetheactualdependentvariablevaluesforeachexample'

spredictionandvalidationdatasets.Themeansquaredifferenceallowsus

Comparethedifferentmodelsandestimatetheaccuracyofthemodelinforecasting.

WeuseexamplesfromChaterjee,Hadi,andPricetoevaluatetheperformanceofmanagersinbigfinancialinstitutions

Theprocessofmultivariatelinearregressionisshown.

ThedatashowninTable2.1arederivedfromasurveyofofficestaffatadepartmentofamajorfinancialinstitution

Sub.Dependentvariableisameasureoftheefficiencyofadepartmentleadingbytheagency'

smanagers.Alldependentvariablesandindependentvariablesare

25employeesaregradedfrom1to5indifferentaspectsofthemanagement'

swork.Asaresult,foreachvariable

Theminimumis25andthemaximumis125.Theseratingsareasurveyof25employeesineachdepartmentand30employeesineachdepartment

Answer。

ThepurposeoftheanalysisistoexplorethefeasibilityofusingquestionnairestopredicttheefficiencyoftheDepartment,thusavoidingdirectmeasurementofefficiency

Effort.Variablesareanswerstosurveyquestions,andaredescribedasfollows:

EfficiencymeasurementofYmanagement;

Dealingwithemployeecomplaints;

1X

Noprivilegesallowed;

2X

Opportunitiestolearnnewthings;

3X

Promoteaccordingtoperformance;

4X

Toobadabouttheperformance;

5X

Toadvancetheprogressofabetterjob;

6X

ThemultiplelinearregressionestimatesarecomputedbytheStatCalcplug-ininExcel,asshownintable2.2.

Table2.2

Theequationforpredictingefficiencyis

Y=13.182+0.5830.044+0.3290.057+0.1120.1971X2X3X4X5X6X

InTable2.3,weusetenexamplesasvalidationdata.Applytheprecedingequationtothepredictionsgiveninthevalidationdata

Andtheerrorisshownintable2.3.Theerrorrepresentedbythelastcolumnisthedifferencebetweenthepredictedandactualvalues.Forexample,theerrorofexample21

Thedifferenceis44.4650=5.54

Table2.3

Example

Y

X1

X2

X3

X4

X5

X6

predictedvalue

error

Twenty-one

Fifty

Forty

Thirty-three

Thirty-four

Forty-three

Sixty-four

Forty-fourpointfoursix

-5.54

Twenty-two

Sixty-one

Fifty-two

Sixty-two

Sixty-six

Eighty

四十一

六十三点九八

-0.02

二十三

五十三

六十六

五十二

五十

六十三

八十

三十七

六十三点九一

十点九一

二十四

四十

四十二

五十八

五十七

四十九

四十五点八七

五点八七

二十五

五十四

四十八

七十五

三十三

五十六点七五

结果

二十六

七十七

八十八

七十六

七十二

六十五点二二

-0.78

二十七

七十八

七十四

七十三点二三

-4.77

二十八

四十四

四十五

五十一

八十三

三十八

五十八点一九

十点一九

二十九

八十五

七十一

五十五

七十六点零五

-8.95

三十

八十二

三十九

五十九

六十四

七十六点一零

-5.90

平均标

准差

六十二点三八

十一点三零

-0.52

七点一七

我们注意到预测的平均误差很小（－因此预测是无偏的进而误差大致是正态的0.52），

因此这个模型给出的预测误差大致95%的可能处在真值的34.14±

（两个标准偏差）区间之

内。

2.3在线性回归中子集的选择

一个在数据挖掘中经常遇到的问题是当我们用回归方程预测因变量的值时，在模型中有

很多的变量可以作为候选的自变量给出的一个高速的线性回归现代算法，可以尝试在某些。

情形下用一种极端现实主义的方法：

为什么会为选择子集而烦恼？

在模型中用所有的变量就

可以了。

这有几个原因表明上述做法并不理想。

..收集到全部的用来预测的变量代价较高；

..我们也许能够做到用更少的变量做到更精确的度量（例如在调查方面）；

..精简是一个好模型的重要属性当有较少的变量时，我们能洞察模型中自变量的影。

响；

..由于在模型中很多变量存在多重共线性问题会导致估计出的回归系数不稳定。

我们

用更少的变量能更好地洞察模型中自变量的影响，由于简单的模型的回归系数更稳

定；

..当自变量和因变量无关时会增加预测值的方差；

..去掉回归系数小的自变量会减少预测的平均误差；

让我们以简单的有两个自变量的例子来例示后面两点原因。

在多于两个自变量的情形

下，这些探究依然有效。

2.3.1去掉不相关的变量

假设真实的自变量Y的方程是（模型1），εβ+=11xy，并且假设我们用来估计Y的

方程是（用附加的实际上是无关的变量），2xεββ++=2211xxy（模型2）

我们使用数据。

我们能够得出在这种情形下，，的最小二乘

估计将服从如下的期望值和方差：

nixxyiii，…，2,1，，，21=

^

1β^

2β

1^

（1）ββ=E，

Σ=

。

=niixrvar1212122^

1）1（

）（

δβ

（0）

2=βe，

=niixrvar1222122^

2）1（

其中，是，间的相关系数。

12r1x2x

我们注意到由于期望值是0，所以，分别是

2β21，ββ的无偏估计。

如果我们用模型会

得到：

=nixvar1212^

（1）

注意到在这种情形下β1的方差较低。

方差对于无偏估计是误差平方的期望值。

Sowhenwemakepredict

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

下载	加入VIP,免费下载

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 多元线性回归multiple linear regression 多元线性回归 multiple

冰豆网所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：多元线性回归multiple linear regressionWord格式.docx
链接地址：https://www.bdocx.com/doc/19192050.html

多元线性回归multiple linear regressionWord格式.docx

热门标签