聚类分析外文文献及翻译.docx
- 文档编号:24787747
- 上传时间:2023-06-01
- 格式:DOCX
- 页数:15
- 大小:44.68KB
聚类分析外文文献及翻译.docx
《聚类分析外文文献及翻译.docx》由会员分享,可在线阅读,更多相关《聚类分析外文文献及翻译.docx(15页珍藏版)》请在冰豆网上搜索。
聚类分析外文文献及翻译
聚类分析外文文献及翻译
本科毕业论文
外文文献及译文
文献、资料题目:
ClusterAnalysis
—BasicConceptsandAlgorithms
文献、资料来源:
文献、资料发表(出版)日期:
院(部):
土木工程学院
专业:
土木工程
班级:
姓名:
学号:
指导教师:
翻译日期:
外文文献:
ClusterAnalysis
—BasicConceptsandAlgorithms
Clusteranalysisdividesdataintogroups(clusters)thataremeaningful,useful,orboth.Ifmeaningfulgroupsarethegoal,thentheclustersshouldcapturethenaturalstructureofthedata.Insomecases,however,clusteranalysisisonlyausefulstartingpointforotherpurposes,suchasdatasummarization.Whetherforunderstandingorutility,clusteranalysishaslongplayedanimportantroleinawidevarietyoffields:
psychologyandothersocialsciences,biology,statistics,patternrecognition,informationretrieval,machinelearning,anddatamining.
Therehavebeenmanyapplicationsofclusteranalysistopracticalproblems.Weprovidesomespecificexamples,organizedbywhetherthepurposeoftheclusteringisunderstandingorutility.
ClusteringforUnderstandingClasses,orconceptuallymeaningfulgroupsofobjectsthatsharecommoncharacteristics,playanimportantroleinhowpeopleanalyzeanddescribetheworld.Indeed,humanbeingsareskilledatdividingobjectsintogroups(clustering)andassigningparticularobjectstothesegroups(classification).Forexample,evenrelativelyyoungchildrencanquicklylabeltheobjectsinaphotographasbuildings,vehicles,people,animals,plants,etc.Inthecontextofunderstandingdata,clustersarepotentialclassesandclusteranalysisisthestudyoftechniquesforautomaticallyfindingclasses.Thefollowingaresomeexamples:
Biology.Biologistshavespentmanyyearscreatingataxonomy(hierarchicalclassification)ofalllivingthings:
kingdom,phylum,class,order,family,genus,andspecies.Thus,itisperhapsnotsurprisingthatmuchoftheearlyworkinclusteranalysissoughttocreateadisciplineofmathematicaltaxonomythatcouldautomaticallyfindsuchclassificationstructures.Morerecently,biologistshaveappliedclusteringtoanalyzethelargeamountsofgeneticinformationthatarenowavailable.Forexample,clusteringhasbeenusedtofindgroupsofgenesthathavesimilarfunctions.
•InformationRetrieval.TheWorldWideWebconsistsofbillionsofWebpages,andtheresultsofaquerytoasearchenginecanreturnthousandsofpages.Clusteringcanbeusedtogroupthesesearchresultsintoasmallnumberofclusters,eachofwhichcapturesaparticularaspectofthequery.Forinstance,aqueryof“movie”mightreturnWebpagesgroupedintocategoriessuchasreviews,trailers,stars,andtheaters.Eachcategory(cluster)canbebrokenintosubcategories(sub-clusters),producingahierarchicalstructurethatfurtherassistsauser’sexplorationofthequeryresults.
•Climate.UnderstandingtheEarth’sclimaterequiresfindingpatternsintheatmosphereandocean.Tothatend,clusteranalysishasbeenappliedtofindpatternsintheatmosphericpressureofpolarregionsandareasoftheoceanthathaveasignificantimpactonlandclimate.
•PsychologyandMedicine.Anillnessorconditionfrequentlyhasanumberofvariations,andclusteranalysiscanbeusedtoidentifythesedifferentsubcategories.Forexample,clusteringhasbeenusedtoidentifydifferenttypesofdepression.Clusteranalysiscanalsobeusedtodetectpatternsinthespatialortemporaldistributionofadisease.
•Business.Businessescollectlargeamountsofinformationoncurrentandpotentialcustomers.Clusteringcanbeusedtosegmentcustomersintoasmallnumberofgroupsforadditionalanalysisandmarketingactivities.
ClusteringforUtility:
Clusteranalysisprovidesanabstractionfromindividualdataobjectstotheclustersinwhichthosedataobjectsreside.Additionally,someclusteringtechniquescharacterizeeachclusterintermsofaclusterprototype;i.e.,adataobjectthatisrepresentativeoftheotherobjectsinthecluster.Theseclusterprototypescanbeusedasthebasisforanumberofdataanalysisordataprocessingtechniques.Therefore,inthecontextofutility,clusteranalysisisthestudyoftechniquesforfindingthemostrepresentativeclusterprototypes.
•Summarization.Manydataanalysistechniques,suchasregressionorPCA,haveatimeorspacecomplexityofO(m2)orhigher(wheremisthenumberofobjects),andthus,arenotpracticalforlargedatasets.However,insteadofapplyingthealgorithmtotheentiredataset,itcanbeappliedtoareduceddatasetconsistingonlyofclusterprototypes.Dependingonthetypeofanalysis,thenumberofprototypes,andtheaccuracywithwhichtheprototypesrepresentthedata,theresultscanbecomparabletothosethatwouldhavebeenobtainedifallthedatacouldhavebeenused.
•Compression.Clusterprototypescanalsobeusedfordatacompres-sion.Inparticular,atableiscreatedthatconsistsoftheprototypesforeachcluster;i.e.,eachprototypeisassignedanintegervaluethatisitsposition(index)inthetable.Eachobjectisrepresentedbytheindexoftheprototypeassociatedwithitscluster.Thistypeofcompressionisknownasvectorquantizationandisoftenappliedtoimage,sound,andvideodata,where
(1)manyofthedataobjectsarehighlysimilartooneanother,
(2)somelossofinformationisacceptable,and(3)asubstantialreductioninthedatasizeisdesired
•EffcientlyFindingNearestNeighbors.Findingnearestneighborscanrequirecomputingthepairwisedistancebetweenallpoints.Oftenclustersandtheirclusterprototypescanbefoundmuchmoreeffciently.Ifobjectsarerelativelyclosetotheprototypeoftheircluster,thenwecanusetheprototypestoreducethenumberofdistancecomputationsthatarenecessarytofindthenearestneighborsofanobject.Intuitively,iftwoclusterprototypesarefarapart,thentheobjectsinthecorrespondingclusterscannotbenearestneighborsofeachother.Consequently,tofindanobject’snearestneighborsitisonlynecessarytocomputethedistancetoobjectsinnearbyclusters,wherethenearnessoftwoclustersismeasuredbythedistancebetweentheirprototypes.
Thischapterprovidesanintroductiontoclusteranalysis.Webeginwithahigh-leveloverviewofclustering,includingadiscussionofthevariousap-proachestodividingobjectsintosetsofclustersandthedifferenttypesofclusters.Wethendescribethreespecificclusteringtechniquesthatrepresentbroadcategoriesofalgorithmsandillustrateavarietyofconcepts:
K-means,agglomerativehierarchicalclustering,andDBSCAN.Thefinalsectionofthischapterisdevotedtoclustervalidity—methodsforevaluatingthegoodnessoftheclustersproducedbyaclusteringalgorithm.MoreadvancedclusteringconceptsandalgorithmswillbediscussedinChapter9.Wheneverpossible,wediscussthestrengthsandweaknessesofdifferentschemes.Inaddition,thebibliographicnotesprovidereferencestorelevantbooksandpapersthatexploreclusteranalysisingreaterdepth.
1.1Overview
Beforediscussingspecificclusteringtechniques,weprovidesomenecessarybackground.First,wefurtherdefineclusteranalysis,illustratingwhyitisdiffcultandexplainingitsrelationshiptoothertechniquesthatgroupdata.Thenweexploretwoimportanttopics:
(1)differentwaystogroupasetofobjectsintoasetofclusters,and
(2)typesofclusters.
1.1.1WhatIsClusterAnalysis?
Clusteranalysisgroupsdataobjectsbasedonlyoninformationfoundinthedatathatdescribestheobjectsandtheirrelationships.Thegoalisthattheobjectswithinagroupbesimilar(orrelated)tooneanotheranddifferentfrom(orunrelatedto)theobjectsinothergroups.Thegreaterthesimilarity(orhomogeneity)withinagroupandthegreaterthedifferencebetweengroups,thebetterormoredistincttheclustering.
Clusteranalysisisrelatedtoothertechniquesthatareusedtodividedataobjectsintogroups.Forinstance,clusteringcanberegardedasaformofclassificationinthatitcreatesalabelingofobjectswithclass(cluster)labels.However,itderivestheselabelsonlyfromthedata.Incontrast,classificationnthesenseofChapter4issupervisedclassification;i.e.,new,unlabeledobjectsareassignedaclasslabelusingamodeldevelopedfromobjectswithknownclasslabels.Forthisreason,clusteranalysisissometimesreferredtoasunsupervisedclassification.Whenthetermclassificationisusedwithoutanyqualificationwithindatamining,ittypicallyreferstosupervisedclassification.
Also,whilethetermssegmentationandpartitioningaresometimesusedassynonymsforclustering,thesetermsarefrequentlyusedforapproachesoutsidethetraditionalboundsofclusteranalysis.Forexample,thetermpartitioningisoftenusedinconnectionwithtechniquesthatdividegraphsintosubgraphsandthatarenotstronglyconnectedtoclustering.Segmentationoftenreferstothedivisionofdataintogroupsusingsimpletechniques;e.g.,animagecanbesplitintosegmentsbasedonlyonpixelintensityandcolor,orpeoplecanbedividedintogroupsbasedontheirincome.Nonetheless,someworkingraphpartitioningandinimageandmarketsegmentationisrelatedtoclusteranalysis.
1.1.2DifferentTypesofClusterings
Anentirecollectionofclustersiscommonlyreferredtoasaclustering,andinthissection,wedistinguishvarioustypesofclusterings:
hierarchical(nested)versuspartitional(unnested),exclusiveversusoverlappingversusfuzzy,andcompleteversuspartial.
HierarchicalversusPartitionalThemostcommonlydiscusseddistinc-tionamongdifferenttypesofclusteringsiswhetherthesetofclustersisnestedorunnested,orinmoretraditionalterminology,hierarchicalorpartitional.Apartitionalclusteringissimply
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 聚类分析 外文 文献 翻译
