高尔顿发明回归Word格式.docx
- 文档编号:17878928
- 上传时间:2022-12-11
- 格式:DOCX
- 页数:12
- 大小:58.81KB
高尔顿发明回归Word格式.docx
《高尔顿发明回归Word格式.docx》由会员分享,可在线阅读,更多相关《高尔顿发明回归Word格式.docx(12页珍藏版)》请在冰豆网上搜索。
2001byJeffreyM.Stanton,allrightsreserved.
Thistextmaybefreelysharedamongindividuals,butitmaynotberepublishedinanymediumwithoutexpresswrittenconsentfromtheauthorandadvancenotificationoftheeditor.
KeyWords:
Correlation;
FrancisGalton;
Historyofstatistics;
KarlPearson.
Abstract
AnexaminationofpublicationsofSirFrancisGaltonandKarlPearsonrevealedthatGalton'
sworkoninheritedcharacteristicsofsweetpeasledtotheinitialconceptualizationoflinearregression.SubsequenteffortsbyGaltonandPearsonbroughtaboutthemoregeneraltechniquesofmultipleregressionandtheproduct-momentcorrelationcoefficient.Moderntextbookstypicallypresentandexplaincorrelationpriortointroducingpredictionproblemsandtheapplicationoflinearregression.ThispaperpresentsabriefhistoryofhowGaltonoriginallyderivedandappliedlinearregressiontoproblemsofheredity.Thishistoryillustratesadditionalapproachesinstructorscanusetointroducesimplelinearregressiontostudents.
1.Introduction
ThecompletenameofthecorrelationcoefficientdeceivesmanystudentsintoabeliefthatKarlPearsondevelopedthisstatisticalmeasurehimself.AlthoughPearsondiddeveloparigoroustreatmentofthemathematicsofthePearsonProductMomentCorrelation(PPMC),itwastheimaginationofSirFrancisGaltonthatoriginallyconceivedmodernnotionsofcorrelationandregression.Galton,acousinofCharlesDarwinandanaccomplished19thcenturyscientistinhisownright,hasoftenbeencriticizedinthiscenturyforhispromotionof"
eugenics"
(plannedbreedingofhumans;
see,forexample,Paul(1995).Historianshavealsosuggestedthathiscousin'
slastingfameunfairlyovershadowedthesubstantialscientificcontributionsGaltonmadetobiology,psychologyandappliedstatistics(see,forexample,FitzPatrick1960).Galton'
sfascinationwithgeneticsandheredityprovidedtheinitialinspirationthatledtoregressionandthePPMC.
ThethoughtsthatpromptedthedevelopmentofthePPMCbeganwithathenvexingproblemofheredity--understandinghowstronglythecharacteristicsofonegenerationoflivingthingsmanifestedinthefollowinggeneration.Galtoninitiallyapproachedthisproblembyexaminingcharacteristicsofthesweetpeaplant.Hechosethesweetpeabecausethatspeciescouldself-fertilize;
daughterplantsexpressgeneticvariationsfrommotherplantswithoutcontributionfromasecondparent.Thischaracteristiceliminated,oratleastpostponed,havingtodealwiththeproblemofstatisticallyassessinggeneticcontributionsfrommultiplesources.Galton'
sfirstinsightsaboutregressionsprangfromatwo-dimensionaldiagramplottingthesizesofdaughterpeasagainstthesizesofmotherpeas.Asdescribedbelow,Galtonusedthisrepresentationofhisdatatoillustratebasicfoundationsofwhatstatisticiansstillcallregression.Thegeneralizationoftheseeffortsintotheproduct-momentcorrelationandthemorecomplexmultipleregressioncamemuchlater.Currenttextbooksofbehavioralsciencestatisticstypicallyreversethisorder:
thePPMCispresentedfirstandlinearregressioniscoveredlater.Manyinstructorsmayalsofeelmorecomfortablestartingwithcorrelationandbuildinguptoregression.
Thepresentpaperprovideshistoricalbackgroundandillustrativeexamplesthatstatisticsinstructorsmayfindusefulinintroducingtheseconceptstocollegelevelclassesinappliedstatistics.Bybrieflytracingthehistoricaldevelopmentofregressionandcorrelation,thispapershowshowintroductorystatisticsinstructorscanuseengagingandhistoricallyaccurateexamplestointroduceregressionandcorrelationtostudents.Anumberofarticlesconcerningtheteachingofregressionandcorrelationindicatethatstudentsoftenhavedifficultyunderstandingtheseconceptsandtheconnectionbetweenthem(see,forexample,Williams1975;
Duke1978;
Karylowski1985;
GoldsteinandStrube1995;
).Thepresentarticleprovidesnewideasforinstructionbasedonthehistoricaloriginsofthesestatisticaltechniques.
2.Galton'
sEarlyConsiderationsofRegression
BesideshisroleasacolleagueofGalton'
sandaresearcherinGalton'
slaboratory,KarlPearsonalsobecameGalton'
sbiographerafterthelatter'
sdeathin1911(Pearson1922).Inhisfour-volumebiographyofGalton,Pearsondescribedthegenesisofthediscoveryoftheregressionslope(Pearson1930).1875年,Galton把7包甜豌豆(sweetpea)种籽分发给7位朋友,每包里的种子是重量一样的,但是包间重量差别很大(alsoseeGalton1894),。
朋友们种下这些种籽,又把收获的豆子寄还Galton(seeAppendixA)。
Galtonplottedtheweightsofthedaughterseedsagainsttheweightsofthemotherseeds.Galtonrealizedthatthemedianweightsofdaughterseedsfromaparticularsizeofmotherseedapproximatelydescribedastraightlinewithpositiveslopelessthan1.0:
"
Thushenaturallyreachedastraightregressionline,andtheconstantvariabilityforallarraysofonecharacterforagivencharacterofasecond.Itwas,perhaps,bestfortheprogressofthecorrelationalcalculusthatthissimplespecialcaseshouldbepromulgatedfirst;
itissoeasilygraspedbythebeginner."
(Pearson1930,p.5)
Thesimple,specialcasethatPearsonreferredtois,ofcourse,boththeroughlyequivalentvariabilityofthetwomeasuresandtheiridenticalunitsofmeasurement.Figure1usesasimple,inventeddatasettoillustrateGalton'
searliestfindings.TheparentsweetpeasizeontheX-axisandtheoffspringsweetpeasizeontheY-axishaveapproximatelyequalvariability.Thus,theslopeofthelineconnectingthemeansofthedifferentcolumnsofpointsisequivalentbothtotheregressionslopeandthecorrelationcoefficient.ForGalton'
spurposes,anyslopesmallerthan1.0indicatedregressiontothemeanforthatgenerationofpeas.Thephenomenonofregressiontothemeanisillustratedbytheconfigurationofpoints:
They-coordinatesofmostofthepointsinFigure1areclosertothehorizontaloffspringmeanthantheirx-coordinatesaretotheverticalparentmean.Galton'
sfirstdocumentedstudyofthistypesuggestedaslopeof0.33(obtainedthroughcarefulinspectionofhisscatterplots),whichindicatedtohimthatextremelylargeorsmallmotherseedstypicallygeneratedsubstantiallylessextremedaughterseeds.Thisfindingis,ofcourse,prototypicalofregressiontothemean:
Formanyvariables,naturalprocessesworkto"
dampen"
extremeoutliersandbringthemclosertotheirrespectivemeans.
Figure1.
Figure1.Connectingthemeansoftheindividualcolumnsofdataprovidesacrudeapproximationoftheregressionline.Theslopeisexactly0.50andthecorrelationisapproximatelyr=0.51.Many,thoughnotall,ofthepointsareclosertotheoffspringpeasizemeanof9ontheY-axisthantotheparentalpeasizemeanof10ontheX-axis.ThenumericdataappearinAppendixB.
Nonetheless,onlyahorizontallinewouldhaveindicatednoheritabilityinseedsizewhatsoever,soGalton'
sfindingaffirmedhisbasicassumptionsconcerningtheheritabilityof"
characters."
Figure1,greatlysimplifiedfromGalton'
soriginalgraph,illustrateshowalineconnectingthemeansofthecolumnsofdatapointsindicatesthedegreetowhichextremevaluesinthefirstgeneration(ontheX-axis)tendtoregresstowardthemeanofthesecondgeneration(ontheY-axis).Iinventedthedatapoints,whicharelistedinAppendixB,tosimplifyhandcalculationinaclassroomsetting.IncontrasttoFigure1,Galton'
soriginaldatadidnotproduceaperfectlysmoothline,buthewasabletodraw,byhand,asinglelinethatfitallthedatareasonablywell(Galton'
sfirstregressionlinewaspresentedatalecturein1877;
see(Pearson1930).Theslopeofthislinehedesignated"
r"
forregression.OnlyunderPearson'
slatertreatmentdidrcometostandforthecorrelationcoefficient(Pearson1896).
Galton'
sprogresswasbotheasedandhobbledbyhischoicesfordescriptivestatistics;
heusedthemedianasameasureofcentraltendencyandthesemi-interquartilerangeasameasureofvariability.Oneadvantageofthesemeasureslayinthesimplicityofobtainingthem.Galtonwasnearlyfanaticalaboutgraphingandtabulatingeveryavailabledatapoint.Thesedescriptivevaluescouldemergefromaninspectionoftheresultingfigureortablewithaminimumofcomputation.Itisunderstoodnowthatthemedianandsemi-interquartilerangedonothavethefavorablemathematicalpropertiesofthemeanandstandarddeviation(forexample,theycannotbemanipulatedusingcovariancealgebra).ButGaltonwasnotasophisticatedenoughmathematiciantorecognizethedeficiency.SoGalton'
sprogresstowardamoregeneralimplementationofregressionwasdelayedbyhischoiceofdescriptivestatistics.InNaturalInheritance(Galton1894),GaltonexpendedapageortwomakingvariousargumentsabouttheexactvalueoftheslopeofaregressionlineascalculatedwithvarioustechniquestoestimatethechangeinYversusthechangeinXonthescatterplot.Atthatpointintime,hiseffortslackedthemathematicalfoundationtoderivetheslopefromthedatathemselves.Asaninterestingfootnote,inthelate1870s,Galtondidnothaveaccesstoamechanicalcalculatingmachine,whereasPearsonhadoneforpersonaluseonhisdesknolaterthan1910(Pearson1938).
3.Galton'
sRecognitionoftheGeneralityofRegressionSlope
Evenwithhispoorchoiceofdescriptivestatistics,Galtonwasabletogeneralizehisworkoveravarietyofheredityproblems.Hetackledpersonalitytemperament,artisticability,anddis
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 高尔顿 发明 回归