书签分享收藏举报版权申诉 / 13

立即下载加入VIP,免费下载

当前位置：首页 > 人文社科 > 哲学历史 > 数据挖掘期末考试在线测试答案.docx

数据挖掘期末考试在线测试答案.docx

文档编号：28939442
上传时间：2023-07-20
格式：DOCX
页数：13
大小：40.67KB

数据挖掘期末考试在线测试答案.docx

《数据挖掘期末考试在线测试答案.docx》由会员分享，可在线阅读，更多相关《数据挖掘期末考试在线测试答案.docx（13页珍藏版）》请在冰豆网上搜索。

数据挖掘期末考试在线测试答案.docx

数据挖掘期末考试在线测试答案

一个食品连锁店每周的事务记录如下表所示，其中每一条事务表示在一项收款机业务中卖出的项目，假定supmin=20%，confmin=40%，使用Apriori算法计算生成的关联规则，标明每趟数据库扫描时的候选集和大项目集。

事务

项目

事务

项目

T1

T2

T3

面包、果冻、花生酱

面包、花生酱

面包、牛奶、花生酱

T4

T5

啤酒、面包

啤酒、牛奶

解：

1）扫描数据库对每个候选计算支持

C1

项集

支持度

{面包}

{花生酱}

{牛奶}

{啤酒}

{果冻}

4/5

3/5

2/5

1/5

2）比较候选支持度与最小支持度，得出频繁项集L1

L1

项集

支持度

{面包}

{花生酱}

{牛奶}

{啤酒}

{果冻}

4/5

3/5

2/5

1/5

3）由L1产生候选C2

C2

项集

{面包，花生酱}

{面包，牛奶}

{面包，啤酒}

{面包，果冻}

{花生酱，牛奶}

{花生酱，啤酒}

{花生酱，果冻}

{牛奶，啤酒}

{牛奶，果冻}

{啤酒，果冻}

4）扫描，对每个候选计算支持度

C2

项集

支持度

{面包，花生酱}

{面包，牛奶}

{面包，啤酒}

{面包，果冻}

{花生酱，牛奶}

{花生酱，啤酒}

{花生酱，果冻}

{牛奶，啤酒}

{牛奶，果冻}

{啤酒，果冻}

3/5

1/5

0

1/5

0

5）比较候选支持度与最小支持度，得出频繁项集L2

L2

项集

支持度

{面包，花生酱}

{面包，牛奶}

{面包，啤酒}

{面包，果冻}

{花生酱，牛奶}

{花生酱，果冻}

{牛奶，啤酒}

3/5

1/5

6）由L2产生候选C3

C3

项集

{面包，花生酱，牛奶}

{面包，花生酱，啤酒}

{面包，花生酱，果冻}

{面包，牛奶，啤酒}

{面包，牛奶，果冻}

{面包，啤酒，果冻}

{花生酱，牛奶，果冻}

{花生酱，牛奶，啤酒}

7）扫描，对每个候选计算支持度

C3

项集

支持度

{面包，花生酱，牛奶}

{面包，花生酱，啤酒}

{面包，花生酱，果冻}

{面包，牛奶，啤酒}

{面包，牛奶，果冻}

{面包，啤酒，果冻}

{花生酱，牛奶，果冻}

{花生酱，牛奶，啤酒}

1/5

0

1/5

0

8）比较候选支持度与最小支持度，得出频繁项集L3

C3

项集

支持度

{面包，花生酱，牛奶}

{面包，花生酱，果冻}

1/5

下面计算关联规则：

<1>{面包，花生酱，牛奶}的非空子集有{面包，花生酱}，{面包，牛奶}，{花生酱，牛奶}，{面包}，{花生酱}，{牛奶}

{面包，花生酱}{牛奶}confidence=

=33.3%

{面包，牛奶}{花生酱}confidence=

=100%

{花生酱，牛奶}{面包}confidence=

=100%

{面包}{花生酱，牛奶}confidence=

=25%

{花生酱}{面包，牛奶}confidence=

=33.3%

{牛奶}{面包，花生酱}confidence=

=50%

故强关联规则有{面包，牛奶}{花生酱}，{花生酱，牛奶}{面包}，

{牛奶}{面包，花生酱}

<2>{面包，花生酱，果冻}的非空子集有{面包，花生酱}，{面包，果冻}，{花生酱，果冻}，{面包}，{花生酱}，{果冻}

{面包，花生酱}{果冻}confidence=

=33.3%

{面包，果冻}{花生酱}confidence=

=100%

{花生酱，果冻}{面包}confidence=

=100%

{面包}{花生酱，果冻}confidence=

=25%

{花生酱}{面包，果冻}confidence=

=33.3%

{果冻}{面包，花生酱}confidence

=100%

故强关联规则有{面包，果冻}{花生酱}，{花生酱，果冻}{面包}，

{果冻}{面包，花生酱}

Thefollowingshowsahistoryofcustomerswiththeirincomes,agesandanattributecalled“Have_iPhone”

indicatingwhethertheyhaveaniPhone.WealsoindicatewhethertheywillbuyaniPadornotinthelast

column.

No. Income Age Have_iPhone Buy_iPad

1 high young yes yes

2 high old yes yes

3 medium young no yes

4 high old no yes

5 medium young no no

6 medium young no no

7 medium old no no

8 medium old no no

（a）WewanttotrainaCARTdecisiontreeclassifiertopredictwhetheranewcustomerwillbuyaniPadornot.WedefinethevalueofattributeBuy_iPadisthelabelofarecord.

（i）PleasefindaCARTdecisiontreeaccordingtotheaboveexample.Inthedecisiontree,whenever

weprocessanodecontainingatmost3records,westoptoprocessthisnodeforsplitting.

（ii）ConsideranewyoungcustomerwhoseincomeismediumandhehasaniPhone.Pleasepredict

whetherthisnewcustomerwillbuyaniPadornot.

（b）WhatisthedifferencebetweentheC4.5decisiontreeandtheID3decisiontree?

Whyisthereadifference?

解：

a.（i）对于所给定样本的期望信息是：

-

log2

-

log2

=1

属性Income的样本:

Info（high）=-3log21-0log20=0

Info（medium）=-

log2

-

log2

=0.72193

期望信息为：

×0+

×0.72193=0.27072

信息增益为：

Gain（Income）=1-E（Income）=0.729277

同样计算知：

Gain（Age）=0.09436

Gain（Have_iPhone）=0.311

这三个属性中Income的Gain最大，所以选择Income为最优特征，于是根节点生成两个子节点，一个是叶节点，对另一个节点继续使用以上方法，在A2，A3选择最优特征及其最优切分点，结果是Age。

依此计算得，CART树为：

（ii）这个新的年轻、中等收入、有IPhone的顾客，将不会购买IPad。

（b）C4.5决策树算法和ID3算法相似，但是C4.5决策树算法是对ID3算法的改进，ID3算法在生成决策树的过程中，使用信息增益来进行特征选择，是选择信息增益最大的特征；C4.5算法在生成决策树的过程中，用信息增益比来选择特征，是选择信息增益比最大的特征。

因为信息增益的大小是相对于训练数据集而言的，并没有绝对的意义，在分类困难时，也就是在训练数据集的经验熵大的时候，信息增益会偏大，反之，信息增益会偏小。

使用信息增益比可以对这一问题进行校正。

Considerthefollowingeighttwo-dimensionaldatapoints:

x1:

（23,12）,x2:

（6,6）,x3:

（15,0）,x4:

（15,28）,x5:

（20,9）,x6:

（8,9）,x7:

（20,11）,x8:

（8,13）,

Consideralgorithmk-means.

Pleaseanswerthefollowingquestions.Youarerequiredtoshowtheinformationabouteachfinalcluster

（includingthemeanoftheclusterandalldatapointsinthiscluster）.Youcanconsiderwritingaprogramfor

thispartbutyouarenotrequiredtosubmittheprogram.

（a）Ifk=2andtheinitialmeansare（20,9）and（8,9）,whatistheoutputofthealgorithm?

（b）Ifk=2andtheinitialmeansare（15,0）and（15,29）,whatistheoutputofthealgorithm?

解：

（a）已知K=2，初始质心是（20,9）、（8,9）

则：

M1

M2

K1

K2

（20,9）

（8,9）

（20,9）,（23,12）,（15,0）,（15,28）,（20,11）

（8,9）,（6,6）,（8,13）

（18.6,12）

（7.3,9.3）

（23,12）,（15,28）,（20,9）,（20,11）}

（15,0）,（6,6）,（8,9）,（8,13）

（19.5,15）

（9.5,7）

（23,12）,（15,28）,（20,9）,（20,11）

（15,0）,（6,6）,（8,9）,（8,13）

所以，算法输出两个簇：

K1={x1,x4,x5,x7}

K2={x2,x3,x6,x8}

（b）已知K=2，初始质心是（15,0）、（15,29）

则：

M1

M2

K1

K2

（15,0）

（15,29）

（23,12）,（6,6）,（15,0）,（20,9）,（8,9）,（20,11）,（8,13）

（15,28）

（14.3,8.6）

（15,28）

（23,12）,（6,6）,（15,0）,（20,9）,（8,9）,（20,11）,（8,13）

（15,28）

所以，算法输出两个簇：

K1={x1,x2,x3,x5,x6,x7,x8}

K2={x4}

4.Considereightdatapoints

Thefollowingmatrixshowsthepairwisedistancesbetweenanytwopoints.

12345678

10

2110

35130

4122140

57171180

6134155200

7915121615190

8112012211722300

Pleaseusetheagglomerationapproachtoclustertheseeightpointsintotwogroups/clustersbyusingdistancecompletelinkage.

Pleasewritedownalldatapointsforeachclusterandwritedownthedistancebetweenthetwoclusters.

35距离1合并为簇（3，5）

1

2

3

4

5

6

7

8

1

0

2

11

0

3

5

13

0

4

12

2

14

0

5

7

17

1

18

0

6

13

4

15

5

20

　0

7

9

15

12

16

15

19

　0

8

11

20

12

21

17

22

30

0

24距离2合并为簇（2，4）

1

2

3,5

4

6

7

8

1

0

2

11

0

3,5

5

13

0

4

12

2

14

0

6

13

4

15

5

0

7

9

15

12

16

19

0

8

11

20

12

21

22

30

0

（2，4）6距离4合并为簇（2，4,6）

1

2,4

3,5

6

7

8

1

0

2,4

11

0

3,5

5

13

0

6

13

4

15

0

7

9

15

12

19

　0

8

11

20

12

22

30

0

1距离（3,5）为5合并为簇（1，3,5）

1

2,4,6

3,5

7

8

1

0

2,4,6

11

0

3，5

5

13

0

7

9

15

12

　0

8

11

20

12

30

0

（1,3,5）距离7为9合并为簇（1，3,5,7）

1，3,5

2,4,6

7

8

1,3,5

0

2，4,6

11

0

7

9

15

0

8

11

20

30

0

（1,3,5,7）距离8为11合并为簇（1,3,5,7，8）

1，3,5,7

2,4,6

8

1，3,5,7

0

2,4,6

11

0

8

30

20

0

0合并

1，3，5,7，8

2,4,6

1,3,5,7，8

0

2,4,6

11

0

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

下载	加入VIP,免费下载

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 数据挖掘期末考试在线测试答案

冰豆网所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：数据挖掘期末考试在线测试答案.docx
链接地址：https://www.bdocx.com/doc/28939442.html

数据挖掘期末考试在线测试答案.docx

热门标签