地理空间分析原理技术与软件工具.docx
- 文档编号:8262781
- 上传时间:2023-01-30
- 格式:DOCX
- 页数:51
- 大小:730.94KB
地理空间分析原理技术与软件工具.docx
《地理空间分析原理技术与软件工具.docx》由会员分享,可在线阅读,更多相关《地理空间分析原理技术与软件工具.docx(51页珍藏版)》请在冰豆网上搜索。
地理空间分析原理技术与软件工具
空间数据分析与空间计量部分主要内容
张红历
Outlier detection
OneoftheareasESDA toolsfocusonisoutlierdetection,astherearemanyinstancesinwhichso-calledoutliersareofgreatinterest.Inthepresentcontextthesearespatialobjectswhosevalueononeormoreattributesismarkedlydifferentfromothersinthesetunderconsideration.Thedatainquestionmaybecorrectormaybetheresultofsomeformoferror(measurement,coding,representationetc.).Suchdataareofinterestsincetheymayrepresentthemostimportantitemsinaninvestigation(e.g.mineralconcentrations,apollutionsource,anunexpectedlyhighincidenceofaparticulardisease).Ortheymightrepresentdatathatneedtoberemovedoradjusted(e.g.smoothed)ifeithertheinformationisknownorsuspectedtobeincorrect,orifitsretentionwilladverselyaffecttheresultsobtainedfromtheapplicationofaparticularanalyticaltechnique.
Mappedhistograms
Oneofthesimplestmethodsofhighlightingpossibleoutliers istocreateahistogramofthedata,typicallyusingafineclassdivision,andthentoexaminetheextremeclasses.Wherethisfacilityislinkedtoamapofthedata,thelocationoftheobject(s)maybeidentifiedandexamined(Figure56).TheupperfigureshowsthehistogramandbasicstatisticsfortheattributeOWN_OCC(thenumberofOwnerOccupiers,i.e.propertyowners)within86censusdistricts(testcensusOutputAreas,OAs,forpartofManchester,UK).Thedistrictswiththehighestdatavalueshavebeenselectedonthehistogramwindow,andaresimultaneouslyhighlightedinthemapwindow(lowerfigure).Thesameapproachmaybeappliedforothervectorobjecttypes,suchaspointdata.
Dataitemsthatlieattheupperorlowerlimitsofadatasetrangemaybedescribedasglobaloutliers.Thistermreferstovaluesthatareextremecomparedtothedatasetasawhole.However,withinthedatasettheremaybevaluesthatare“relativelyextreme”andthesearereferredtoaslocaloutliers.Alocaloutlierisavaluethatismarkedlydifferentfrom(spatially)neighbouringvalues.Anexampleofthismightbeasetofmeasurementstakenalongatransect,withavaluepartofthewayalongthetransectthatisverydifferentfromthoseimmediatelybeforeorafter,butstillwellwithintheoverallrangeofthedatarecordedontheentiretransect.SomeESDA softwarepackages,suchasArcGISGeostatisticalAnalyst,providetoolsfordisplayinglocalaswellasglobaloutliers forselecteddatatypes.
Figure56Histogramlinkage
Source:
UK2001CensusTestOutputAreas(OAs)
Boxplots
Boxplots(orbox-whiskerplots)areaformofEDA providedinmanydataanalysisandgraphingpackages(e.g.SPSS,STATA,Grapher,WinBUGS).Togetherwithdistributionplotsandscatterplots theyprovideoneofthethreemainwaysinwhichstatisticaldataareexaminedgraphically.Becauseboxplotsarelessfamiliartomany,andofparticularuseinexaminingoutliers,wedescribetheminsomedetail(seeFigure57).
Aboxplotconsistsofanumberofdistinctelements.TheexampleinFigure57wasgeneratedusingMATLabStatisticsToolboxandweprovidedefinitionsbelowthatapplytothisparticularimplementation:
∙ Thelowerandupperlinesofthe"box"inthecentreoftheplotwindowarethe25thand75thpercentilesofthesample.Thedistancebetweenthetopandbottomoftheboxistheinter-quartilerange (IQR)
∙ Thelineinthemiddleoftheboxisthesamplemedian.Ifthemedianisnotcentredintheboxitisanindicationofskewness
∙ Thewhiskers arelinesextendingaboveandbelowthebox.Theyshowtheextentoftherestofthesample(unlessthereareoutliers).Assumingnooutliers,themaximumofthesampleisthetopoftheupperwhisker.Theminimumofthesampleisthebottomofthelowerwhisker.Bydefault,anoutlierisavaluethatismorethan1.5timestheIQRawayfromthetoporbottomofthebox(ahinge valueof1.5),sowithoutliersthewhiskersshowaformoftrimmedrange,i.e.excludingtheoutliers(n.b.thetermhingeisalsousedinstatisticstorefertolocationswithinthemaindatarange,insomeinstancesmatchingtheupperandlowerquartilevalues)
∙ Asymbol,e.g.asmallcircle,atthetopand/orbottomoftheplotisanindicationofanoutlierinthedata.Thispointmaybetheresultofadataentryerror,apoormeasurementorperhapsahighlysignificantobservation
∙ Thenotchesintheboxareagraphicconfidenceintervalaboutthemedian ofasample.Aside-by-sidecomparisonoftwonotchedboxplotsissometimesdescribedasthegraphicalequivalentofat-test.Boxplotsdonothavenotchesbydefault
TheboxplotsinFigure57areforasetofradioactivityobservationsmadeat1008sitesinGermanyononedayin2004.Theplotontheleft(Sample1)consistsof200oftherecords,withwhiskersextendingto1.5timestheIQR.
Figure57Simpleboxplot
Datasource:
SIC2004,AI-GEOSTATS
Somepackagesallowuserspecificationofthehinges,orprovideanalternativesetvalue(e.g.3timesIQR—GeoDa;2.5%and97.5%limits—WinBUGS).Theplotontherightshowsafurther808locationsandtheirreadings(seealsoFigure54andFigure510).Threevaluesinsample1weredeliberatelyalteredforthisplot,e.g.simulatingmeasurementorcodingerrors.One,forexample,involvedrecordingameasuredvalueof106.0as16.0.Theplotpicksouteachoftheseoutliers.
Boxplotswithalinktozone-basedspatialdatasetsaresupportedwithinGeoDa.Figure58illustratesthetechnique,againusingtheManchesterareatestcensusOutputAreasdescribedearlierinthissection.ThecensusvariableOwnerOccupierhasbeenselectedformapping,andaconventionalboxplotofthedataisalsoillustrated.IntheGeoDaimplementationeachdataitemthatliesoutsidetheboxbutwithinthewhiskersisshownwitha*,andthesole“true”outlierOAappearsattheverytopoftheboxplotabovetheupperwhisker.ThemappedboxplotshowstheOAsthatfallintothevariousdataquartiles(andthenumberofOAsineach),plusupperandloweroutlierOAs—inthiscasejusttheoneupperOAofrathercomplexshape,thesameasthatidentifiedinFigure56.
Figure58Mappedboxplot
Crosstabulationsandconditionalchoropleth plots
Mappedzonal datatypicallyconsistofasinglevariable,oraratiooftwovariables,oneofwhichisactingasanormalisation factor,e.g.mappingtheratio
r=persons_in_employment/total_population
Separatemapsmaybecreatedforeachvariableorratioofinterest,butthesearetypicallyindependententitiesthatmaybedifficulttocompareandinterpret.Itwouldberelativelystraightforwardtocreateaseriesofmapsofaparticularvariableofinterest,forexamplethereportedrateoflungcancerbyhealth districtwhereeachmapshowedtherateforareaswheretheproportionofsmokerswashigh,mediumorlow.Thiswouldbeaformofcontrolor“conditioning”ontheinformationshown.ThissimpleapproachcanbeimplementedwithinanyGIS.Theapproachcanbeextendedfurthertotwo(ormore)variablesbycrosstabulatingthesourcedata.Ifthecrosstabulation iscarriedoutoncategoricaldata(e.g.sex,racialgrouping)thenonceagainaseriesofmapsmaybegeneratedforeachcellinthecrosstabulation.However,withunclassifiedcontinuouslyvaryingdataitisusefultoexaminetheeffectofspecificlevelsofsuchdataonthespatialdistribution.Specialisedvisualisationtoolshavebeendevelopedrecentlytosupportoperationsofthistype,includingCCMaps andthemostrecentextensionstoGeoDa(whichhavebeenderivedfromtheideasdevelopedintheCCMapsproject).TheyareknownasinteractiveConditionalChoropleth mapping toolsandmaybedynamicallylinkedtoothervisualisationssuchasboxplots,histograms andscatterplots.Theaimofsuchsoftware,inthewordsoftheoriginalauthors,isto:
∙ Stimulateanalyticalreasoning
∙ Detecttheunexpected
∙ Discovertheunexpected,and
∙ Stimulatehypothesisgeneration
Figure59illustratesthisprocedureusingdataonlungcancermortalityrates,bycounty,fortheUSA—seeCarretal.(2000,2002)forabriefdescriptionofthemethodandthisparticulardataset.Thereare9mapsintotal.Thecolouredbaratthetopshowshowthecountieshavebeenclassified,withforexample34%intheblue(low)category,correspondingto63.7-375deathsper100,000.Thebreakpointsonthisscalemaybedraggedtoprovidealternativeclassification levels,theeffectsofwhicharedynamicallyupdatedinthemapwindows.EachmaprowrepresentsonelevelofthepercentageofthepopulationbelowtheUSAdesignatedpovertylevel(righthandsliderscale)andeachmapcolumnrepresentsoneleveloftherecordedannualprecipitationlevel.TheregioninSouth-EastoftheUSA(toprightinthesetofmapwindows)appearstohaveahighincidenceoflungcancermortalityandahighscoreforbothconditioningvariables.
Thefiguresinthetoprightofeachmappedwindowshowtheweightedmean mortalityrate,andtheR-squaredvalueinthelowerrightcornershowsthepercentageoftheoverallvariabilityaccountedforbytheseweightedmeans.Byadjustingthetwoconditioningvariablesliders(whichareactuallyaformofboxplot)orbyusingabuilt-insearchfacility(describedascognostics)acombinationofslidervaluesthatmaximiseR2canbeobtained—inthisexampletoavalueofjustbelow43%.FormoredetailsregardingtheapplicationofCCmappingandassociatedlinkedvisualisations(e.g.conditionalbox-plotsandconditionalscatterplots)
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 地理 空间 分析 原理 技术 软件 工具