聚类分析文献英文翻译Word格式.doc
- 文档编号:14186251
- 上传时间:2022-10-19
- 格式:DOC
- 页数:14
- 大小:223.50KB
聚类分析文献英文翻译Word格式.doc
《聚类分析文献英文翻译Word格式.doc》由会员分享,可在线阅读,更多相关《聚类分析文献英文翻译Word格式.doc(14页珍藏版)》请在冰豆网上搜索。
译文名称:
数据挖掘—聚类分析
专业:
自动化
姓名:
****
班级学号:
****
指导教师:
******
译文出处:
Datamining:
IanH.Witten,EibeFrank著
二○一○年四月二十六日
Clustering
5.1INTRODUCTION
Clusteringissimilartoclassificationinthatdataaregrouped.However,unlikeclassification,thegroupsarenotpredefined.Instead,thegroupingisaccomplishedbyfindingsimilaritiesbetweendataaccordingtocharacteristicsfoundintheactualdata.Thegroupsarecalledclusters.Someauthorsviewclusteringasaspecialtypeofclassification.Inthistext,however,wefollowamoreconventionalviewinthatthetwoaredifferent.Manydefinitionsforclustershavebeenproposed:
lSetoflikeelements.Elementsfromdifferentclustersarenotalike.
lThedistancebetweenpointsinaclusterislessthanthedistancebetweenapointintheclusterandanypointoutsideit.
Atermsimilartoclusteringisdatabasesegmentation,whereliketuple(record)inadatabasearegroupedtogether.Thisisdonetopartitionorsegmentthedatabaseintocomponentsthatthengivetheuseramoregeneralviewofthedata.Inthiscasetext,wedonotdifferentiatebetweensegmentationandclustering.AsimpleexampleofclusteringisfoundinExample5.1.Thisexampleillustratesthefactthatthatdetermininghowtodotheclusteringisnotstraightforward.
AsillustratedinFigure5.1,agivensetofdatamaybeclusteredondifferentattributes.Hereagroupofhomesinageographicareaisshown.Thefirstfloortypeofclusteringisbasedonthelocationofthehome.Homesthataregeographicallyclosetoeachotherareclusteredtogether.Inthesecondclustering,homesaregroupedbasedonthesizeofthehouse.
Clusteringhasbeenusedinmanyapplicationdomains,includingbiology,medicine,anthropology,marketing,andeconomics.Clusteringapplicationsincludeplantandanimalclassification,diseaseclassification,imageprocessing,patternrecognition,anddocumentretrieval.Oneofthefirstdomainsinwhichclusteringwasusedwasbiologicaltaxonomy.RecentusesincludeexaminingWeblogdatatodetectusagepatterns.
Whenclusteringisappliedtoareal-worlddatabase,manyinterestingproblemsoccur:
lOutlierhandlingisdifficult.Heretheelementsdonotnaturallyfallintoanycluster.Theycanbeviewedassolitaryclusters.However,ifaclusteringalgorithmattemptstofindlargerclusters,theseoutlierswillbeforcedtobeplacedinsomecluster.Thisprocessmayresultinthecreationofpoorclustersbycombiningtwoexistingclustersandleavingtheoutlierinitsowncluster.
lDynamicdatainthedatabaseimpliesthatclustermembershipmaychangeovertime.
lInterpretingthesemanticmeaningofeachclustermaybedifficult.Withclassification,thelabelingoftheclassesisknownaheadoftime.However,withclustering,thismaynotbethecase.Thus,whentheclusteringprocessfinishescreatingasetofclusters,theexactmeaningofeachclustermaynotbeobvious.Hereiswhereadomainexpertisneededtoassignalabelorinterpretationforeachcluster.
lThereisnoonecorrectanswertoaclusteringproblem.Infact,manyanswersmaybefound.Theexactnumberofclustersrequiredisnoteasytodetermine.Again,adomainexpertmayberequired.Forexample,supposewehaveasetofdataaboutplantsthathavebeencollectedduringafieldtrip.Withoutanypriorknowledgeofplantclassification,ifweattempttodividethissetofdataintosimilargroupings,itwouldnotbeclearhowmanygroupsshouldbecreated.
lAnotherrelatedissueiswhatdatashouldbeusedofclustering.Unlikelearningduringaclassificationprocess,wherethereissomeaprioriknowledgeconcerningwhattheattributesofeachclassificationshouldbe,inclusteringwehavenosupervisedlearningtoaidtheprocess.Indeed,clusteringcanbeviewedassimilartounsupervisedlearning.
Wecanthensummarizesomebasicfeaturesofclustering(asopposedtoclassification):
lThe(best)numberofclustersisnotknown.
lTheremaynotbeanyaprioriknowledgeconcerningtheclusters.
lClusterresultsaredynamic.
TheclusteringproblemisstatedasshowninDefinition5.1.Hereweassumethatthenumberofclusterstobecreatedisaninputvalue,k.Theactualcontent(andinterpretation)ofeachcluster,,,isdeterminedasaresultofthefunctiondefinition.Withoutlossofgenerality,wewillviewthattheresultofsolvingaclusteringproblemisthatasetofclustersiscreated:
K={}.
DEFINITION5.1.GivenadatabaseD={}oftuplesandanintegervaluek,theclusteringproblemistodefineamappingf:
whereeachisassignedtoonecluster,.Acluster,containspreciselythosetuplesmappedtoit;
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 聚类分析 文献 英文翻译