大数据外文翻译文献Word格式.docx
- 文档编号:13815045
- 上传时间:2022-10-13
- 格式:DOCX
- 页数:6
- 大小:20.56KB
大数据外文翻译文献Word格式.docx
《大数据外文翻译文献Word格式.docx》由会员分享,可在线阅读,更多相关《大数据外文翻译文献Word格式.docx(6页珍藏版)》请在冰豆网上搜索。
原文:
WhatisDataMining?
Manypeopletreatdataminingasasynonymforanotherpopularlyusedterm,“KnowledgeDiscoveryinDatabases”,orKDD.Alternatively,othersviewdataminingassimplyanessentialstepintheprocessofknowledgediscoveryindatabases.Knowledgediscoveryconsistsofaniterativesequenceofthefollowingsteps:
·
datacleaning:
toremovenoiseorirrelevantdata,
dataintegration:
wheremultipledatasourcesmaybecombined,
dataselection:
wheredatarelevanttotheanalysistaskareretrievedfromthedatabase,
datatransformation:
wheredataaretransformedorconsolidatedintoformsappropriateforminingbyperformingsummaryoraggregationoperations,forinstance,
datamining:
anessentialprocesswhereintelligentmethodsareappliedinordertoextractdatapatterns,
patternevaluation:
toidentifythetrulyinterestingpatternsrepresentingknowledgebasedonsomeinterestingnessmeasures,and
knowledgepresentation:
wherevisualizationandknowledgerepresentationtechniquesareusedtopresenttheminedknowledgetotheuser.
Thedataminingstepmayinteractwiththeuseroraknowledgebase.Theinterestingpatternsarepresentedtotheuser,andmaybestoredasnewknowledgeintheknowledgebase.Notethataccordingtothisview,dataminingisonlyonestepintheentireprocess,albeitanessentialonesinceituncovershiddenpatternsforevaluation.
Weagreethatdataminingisaknowledgediscoveryprocess.However,inindustry,inmedia,andinthedatabaseresearchmilieu,theterm“datamining”isbecomingmorepopularthanthelongertermof“knowledgediscoveryindatabases”.Therefore,inthisbook,wechoosetousetheterm“datamining”.Weadoptabroadviewofdataminingfunctionality:
dataminingistheprocessofdiscoveringinterestingknowledgefromlargeamountsofdatastoredeitherindatabases,datawarehouses,orotherinformationrepositories.
Basedonthisview,thearchitectureofatypicaldataminingsystemmayhavethefollowingmajorcomponents:
1.Database,datawarehouse,orotherinformationrepository.Thisisoneorasetofdatabases,datawarehouses,spreadsheets,orotherkindsofinformationrepositories.Datacleaninganddataintegrationtechniquesmaybeperformedonthedata.
2.Databaseordatawarehouseserver.Thedatabaseordatawarehouseserverisresponsibleforfetchingtherelevantdata,basedontheuser’sdataminingrequest.
3.Knowledgebase.Thisisthedomainknowledgethatisusedtoguidethesearch,orevaluatetheinterestingnessofresultingpatterns.Suchknowledgecanincludeconcepthierarchies,usedtoorganizeattributesorattributevaluesintodifferentlevelsofabstraction.Knowledgesuchasuserbeliefs,whichcanbeusedtoassessapattern’sinterestingnessbasedonitsunexpectedness,mayalsobeincluded.Otherexamplesofdomainknowledgeareadditionalinterestingnessconstraintsorthresholds,andmetadata(e.g.,describingdatafrommultipleheterogeneoussources).
4.Dataminingengine.Thisisessentialtothedataminingsystemandideallyconsistsofasetoffunctionalmodulesfortaskssuchascharacterization,associationanalysis,classification,evolutionanddeviationanalysis.
5.Patternevaluationmodule.Thiscomponenttypicallyemploysinterestingnessmeasuresandinteractswiththedataminingmodulessoastofocusthesearchtowardsinterestingpatterns.Itmayaccessinterestingnessthresholdsstoredintheknowledgebase.Alternatively,thepatternevaluationmodulemaybeintegratedwiththeminingmodule,dependingontheimplementationofthedataminingmethodused.Forefficientdatamining,itishighlyrecommendedtopushtheevaluationofpatterninterestingnessasdeepaspossibleintotheminingprocesssoastoconfinethesearchtoonlytheinterestingpatterns.
6.Graphicaluserinterface.Thismodulecommunicatesbetweenusersandthedataminingsystem,allowingtheusertointeractwiththesystembyspecifyingadataminingqueryortask,providinginformationtohelpfocusthesearch,andperformingexploratorydataminingbasedontheintermediatedataminingresults.Inaddition,thiscomponentallowstheusertobrowsedatabaseanddatawarehouseschemasordatastructures,evaluateminedpatterns,andvisualizethepatternsindifferentforms.
Fromadatawarehouseperspective,dataminingcanbeviewedasanadvancedstageofon-1ineanalyticalprocessing(OLAP).However,datamininggoesfarbeyondthenarrowscopeofsummarization-styleanalyticalprocessingofdatawarehousesystemsbyincorporatingmoreadvancedtechniquesfordataunderstanding.
Whiletheremaybemany“dataminingsystems”onthemarket,notallofthemcanperformtruedatamining.Adataanalysissystemthatdoesnothandlelargeamountsofdatacanatmostbecategor
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 数据 外文 翻译 文献