大数据数据挖掘培训讲义:偏差检测PPT课件下载推荐.ppt
- 文档编号:15275270
- 上传时间:2022-10-29
- 格式:PPT
- 页数:38
- 大小:1.13MB
大数据数据挖掘培训讲义:偏差检测PPT课件下载推荐.ppt
《大数据数据挖掘培训讲义:偏差检测PPT课件下载推荐.ppt》由会员分享,可在线阅读,更多相关《大数据数据挖掘培训讲义:偏差检测PPT课件下载推荐.ppt(38页珍藏版)》请在冰豆网上搜索。
44Problem:
HealthcareCostsHealthcarecostsinUS:
1outof7GDP$andrisingpotentialproblems:
fraud,misuse,understandingwheretheproblemsareisfirststeptofixingthemGTEselfinsuredformedicalcostsGTEhealthcarecosts$X00,000,000Task:
Analyzeemployeehealthcaredataandgenerateareportthatdescribesthemajorproblems55GTEKeyFindingsReporter:
KEFIRKEFIRApproach:
AnalyzeallpossibledeviationsSelectinterestingfindingsAugmentkeyfindingswith:
ExplanationsofplausiblecausesRecommendationsofappropriateactionsConvertfindingstoauser-friendlyreportwithtextandgraphics66KEFIRSearchSpaceDrill-DownExample88WhatChangeIsImportant?
99DeviationDetectionDrillDownthroughthesearchspaceGenerateafindingforeachmeasuredeviationfrompreviousperioddeviationfromnormdeviationprojectedfornextperiod,ifnoaction1010InterestingnessofDeviationsImpact:
howmuchthedeviationaffectsthebottomlineSavingsPercentage:
howmuchofthedeviationfromthenormcanbeexpectedtobesavedbytheactionRecommendationsHierarchicalrecommendationrulesdefineappropriateinterventionstrategiesforimportantmeasuresandstudyareas.Example:
measure=admissionrateper1000&
study_area=Inpatientadmissions&
percent_change0.10IfThenUtilizationreviewisneededintheareaofadmissioncertification.ExpectedSavings:
20%ExplanationAmeasureisexplainedbyfindingthepathofrelatedmeasureswiththehighestimpactThelargeincreaseinm1ingroups1wascausedbyanincreaseinm3,whichwascausedbyariseinm5,primarilyinsectors13.1313ReportGenerationAutomaticgenerationofbusiness-user-orientedreportsNaturallanguagegenerationwithtemplatematchingGraphicsdeliveredviabrowser1414SampleKEFIRpagesOverviewInpatientadmissions1616StatusPrototypeimplementedinGTEin1995KEFIRreceivedGTEshighestawardfortechnicalachievementin1995KeybusinessuserleftGTEin1996andsystemwasnolongerusedPublication:
SelectingandReportingWhatisInteresting:
TheKEFIRApplicationtoHealthcareData,C.Matheus,G.Piatetsky-Shapiro,andD.McNeill,inAdvancesinKnowledgeDiscoveryandDataMining,AAAI/MITPress,1996WhatsStrangeAboutRecentEvents(WSARE)Weng-KeenWong(CarnegieMellonUniversity)AndrewMoore(CarnegieMellonUniversity)GregoryCooper(UniversityofPittsburgh)MichaelWagner(UniversityofPittsburgh)http:
/www.autonlab.org/wsareDesignedtobeeasilyapplicabletoanydate/time-indexedbiosurveillance-relevantdatastreamMotivationPrimaryKeyDateTimeHospitalICD9ProdromeGenderAgeHomeLocationWorkLocationManymore1006/1/039:
121781FeverM20sNE?
1016/1/0310:
451787DiarrheaF40sNENE1026/1/0311:
031786RespiratoryF60sNEN1036/1/0311:
072787DiarrheaM60sE?
1046/1/0312:
151717RespiratoryM60sENE1056/1/0313:
013780ViralF50s?
NW1066/1/0313:
053487RespiratoryF40sSWSW1076/1/0313:
572786UnmappedM50sSESW1086/1/0314:
221780ViralM40s?
:
SupposewehaveaccesstoEmergencyDepartmentdatafromhospitalsaroundacity(withpatientconfidentialitypreserved)1919TraditionalApproachesWeneedtobuildaunivariatedetectortomonitoreachinterestingcombinationofattributes:
DiarrheacasesamongchildrenRespiratorysyndromecasesamongfemalesViralsyndromecasesinvolvingseniorcitizensfromeasternpartofcityNumberofchildrenfromdowntownhospitalNumberofcasesinvolvingpeopleworkinginsouthernpartofthecityNumberofcasesinvolvingteenagegirlslivinginthewesternpartofthecityBotulinicsyndromecasesAndsoonYoullneedhundredsofunivariatedetectors!
Wewouldliketoidentifythegroupswiththestrangestbehaviorinrecentevents.2020WSAREApproachRule-BasedAnomalyPatternDetectionAssociationrulesusedtocharacterizeanomalouspatterns.Forexample,atwo-componentrulewouldbe:
Gender=MaleAND40Age502121WSAREv2.0Overview2.Searchforrulewithbestscore3.Determinep-valueofbestscoringrulethroughrandomizationtestAllData4.Ifp-valueislessthanthreshold,signalalertRecentDataBaseline1.ObtainRecentandBaselinedatasets2222Step1:
ObtainRecentandBaselineDataRecentDataBaselineDatafromlast24hoursBaselinedataisassumedtocapturenon-outbreakbehavior.Weusedatafrom35,42,49and56dayspriortothecurrentday2323ExampleSat12-23-200135.8%(48/134)oftodayscaseshave30=age4017.0%(45/265)ofother(baseline)caseshave30=age402424Step2.SearchforBestRuleForeachrule,forma2x2contingencytableeg.PerformFishersExactTesttogetap-value(score)foreachrule(forthisdata0.00005)FindruleR-bestwiththelowestscore.Caution:
Thisscoreisnotthetruep-valueofRBESTbecauseofmultipletestsCountRecentCountBaselineAgeDeci
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 数据 挖掘 培训 讲义 偏差 检测