数据挖掘实验报告.docx
- 文档编号:23856828
- 上传时间:2023-05-21
- 格式:DOCX
- 页数:14
- 大小:941.02KB
数据挖掘实验报告.docx
《数据挖掘实验报告.docx》由会员分享,可在线阅读,更多相关《数据挖掘实验报告.docx(14页珍藏版)》请在冰豆网上搜索。
数据挖掘实验报告
《数据挖掘》实验报告
实验序号:
实验项目名称:
C4.5算法
学 号
姓 名
专业、班
12数学金融
实验地点
实验楼5-510
指导教师
潘巍巍
实验时间
2014.12.24
一、实验目的及要求
1:
选择一个数据挖掘标准数据集,采用C4.5算法进行分类,给出分类精度,画出用C4.5算法诱导的树并写出生成的规则集合。
2:
在数据挖掘标准数据集上,实验对比剪枝及未剪枝的树的分类性能。
3:
总结C4.5算法的优缺点
二、实验设备(环境)及要求
电脑WEKA3.6.1
三、实验内容及步骤
(3)数据分类(c4.5算法实现)
1.导入数据
(2)选择C4.5分类器进行分类
结果为
其中分类精度为50%
生成的决策树为
分类规则:
J48prunedtree
------------------
outlook=sunny
|humidity=high:
no(3.0)
|humidity=normal:
yes(2.0)
outlook=overcast:
yes(4.0)
outlook=rainy
|windy=TRUE:
no(2.0)
|windy=FALSE:
yes(3.0)
剪枝后结果为
分类精度变为57.1%性能变好
(1)C4.5算法优缺点
优点:
分类精度高,生成的分类规则比较简单,易于理解。
缺点:
需要多次扫描数据集,比较低效
五、分析及讨论
六、教师评语
签名:
日期:
成绩
《数据挖掘》实验报告
实验序号:
实验项目名称:
KNN算法
学 号
姓 名
专业、班
12数学金融
实验地点
实验楼5-510
指导教师
潘巍巍
实验时间
2014.12.24
一、实验目的及要求
1:
KNN算法的基本思路、步骤。
2:
选择UCI中的5个标准数据集,使用KNN算法在该数据集上计算混淆矩阵。
3:
选择2个数据集,选择不同的k值,k=1,3,5,7,9,对比KNN算法计算结果的差异。
二、实验设备(环境)及要求
电脑WEKA3.6.1
四、实验内容及步骤
1.数据集contact-lenses.arff
Glass.arff
两者的混淆矩阵分别为
(2)两个数据集在K=1,3,5,7,9下结果分别为
Glass:
K=1;
===Summary===
CorrectlyClassifiedInstances15170.5607%
IncorrectlyClassifiedInstances6329.4393%
Kappastatistic0.6005
Meanabsoluteerror0.0897
Rootmeansquarederror0.2852
Relativeabsoluteerror42.3747%
Rootrelativesquarederror87.8627%
TotalNumberofInstances214
===DetailedAccuracyByClass===
TPRateFPRatePrecisionRecallF-MeasureROCAreaClass
0.7860.1670.6960.7860.7380.806buildwindfloat
0.6710.130.7390.6710.7030.765buildwindnon-float
0.2940.0510.3330.2940.3130.59vehicwindfloat
00000?
vehicwindnon-float
0.7690.030.6250.7690.690.895containers
0.7780.0150.70.7780.7370.838tableware
0.7930.0110.920.7930.8520.884headlamps
WeightedAvg.0.7060.1090.7090.7060.7040.792
===ConfusionMatrix===
abcdefg<--classifiedas
55960000|a=buildwindfloat
155140321|b=buildwindnon-float
9350000|c=vehicwindfloat
0000000|d=vehicwindnon-float
02001001|e=containers
0100170|f=tableware
03002123|g=headlamps
K=3;
===Summary===
CorrectlyClassifiedInstances15471.9626%
IncorrectlyClassifiedInstances6028.0374%
Kappastatistic0.6097
Meanabsoluteerror0.0983
Rootmeansquarederror0.2524
Relativeabsoluteerror46.4438%
Rootrelativesquarederror77.7792%
TotalNumberofInstances214
===DetailedAccuracyByClass===
TPRateFPRatePrecisionRecallF-MeasureROCAreaClass
0.8430.2150.6560.8430.7380.865buildwindfloat
0.7110.1380.740.7110.7250.835buildwindnon-float
0.1760.0150.50.1760.2610.672vehicwindfloat
00000?
vehicwindnon-float
0.6150.0150.7270.6150.6670.913containers
0.7780.010.7780.7780.7780.914tableware
0.7930.0110.920.7930.8520.885headlamps
WeightedAvg.0.720.1230.7180.720.7080.847
===ConfusionMatrix===
abcdefg<--classifiedas
591010000|a=buildwindfloat
195420100|b=buildwindnon-float
10430000|c=vehicwindfloat
0000000|d=vehicwindnon-float
0300802|e=containers
0100170|f=tableware
21001223|g=headlamps
K=5;
===Summary===
CorrectlyClassifiedInstances14567.757%
IncorrectlyClassifiedInstances6932.243%
Kappastatistic0.5469
Meanabsoluteerror0.1085
Rootmeansquarederror0.2563
Relativeabsoluteerror51.243%
Rootrelativesquarederror78.9576%
TotalNumberofInstances214
===DetailedAccuracyByClass===
TPRateFPRatePrecisionRecallF-MeasureROCAreaClass
0.8430.2290.6410.8430.7280.867buildwindfloat
0.6840.1740.6840.6840.6840.848buildwindnon-float
00.010000.642vehicwindfloat
00000?
vehicwindnon-float
0.3850.0250.50.3850.4350.952containers
0.6670.010.750.6670.7060.909tableware
0.7930.0160.8850.7930.8360.89headlamps
WeightedAvg.0.6780.1420.6350.6780.6510.853
===ConfusionMatrix===
abcdefg<--classifiedas
591010000|a=buildwindfloat
205210300|b=buildwindnon-float
12500000|c=vehicwindfloat
0000000|d=vehicwindnon-float
0500503|e=containers
0200160|f=tableware
12001223|g=headlamps
K=7;===Summary===
CorrectlyClassifiedInstances13764.0187%
IncorrectlyClassifiedInstances7735.9813%
Kappastatistic0.4948
Meanabsoluteerror0.1147
Rootmeansquarederror0.2557
Relativeabsoluteerror54.1689%
Rootrelativesquarederror78.7876%
TotalNumberofInstances214
===DetailedAccuracyByClass===
TPRateFPRatePrecisionRecallF-MeasureROCAreaClass
0.8290.2710.5980.8290.6950.876buildwindfloat
0.6050.1810.6480.6050.6260.852buildwindnon-float
0.0590.0050.50.0590.1050.71vehicwindfloat
00000?
vehicwindnon-float
0.3080.030.40.3080.3480.939containers
0.5560.0150.6250.5560.5880.976tableware
0.7930.0160.8850.7930.8360.89headlamps
WeightedAvg.0.640.1580.6360.640.6170.864
===ConfusionMatrix===
abcdefg<--classifiedas
581110000|a=buildwindfloat
264600400|b=buildwindnon-float
11510000|c=vehicwindfloat
0000000|d=vehicwindnon-float
0500413|e=containers
1200150|f=tableware
12001223|g=headlamps
K=9;
===Summary===
CorrectlyClassifiedInstances13563.0841%
IncorrectlyClassifiedInstances7936.9159%
Kappastatistic0.4782
Meanabsoluteerror0.1196
Rootmeansquarederror0.2581
Relativeabsoluteerror56.4924%
Rootrelativesquarederror79.5178%
TotalNumberofInstances214
===DetailedAccuracyByClass===
TPRateFPRatePrecisionRecallF-MeasureROCAreaClass
0.8290.2780.5920.8290.690.881buildwindfloat
0.6450.1740.6710.6450.6580.853buildwindnon-float
00.0050000.694vehicwindfloat
00000?
vehicwindnon-float
0.2310.030.3330.2310.2730.933containers
0.2220.0150.40.2220.2860.964tableware
0.7930.0270.8210.7930.8070.888headlamps
WeightedAvg.0.6310.1590.580.6310.5970.864
===ConfusionMatrix===
abcdefg<--classifiedas
581110000|a=buildwindfloat
234900310|b=buildwindnon-float
13400000|c=vehicwindfloat
0000000|d=vehicwindnon-float
0600304|e=containers
3100221|f=tableware
12001223|g=headlamps
contact-lenses:
K=1;
===Summary===
CorrectlyClassifiedInstances1979.1667%
IncorrectlyClassifiedInstances520.8333%
Kappastatistic0.6262
Meanabsoluteerror0.2262
Rootmeansquarederror0.3165
Relativeabsoluteerror59.8856%
Rootrelativesquarederror72.4707%
TotalNumberofInstances24
===DetailedAccuracyByClass===
TPRateFPRatePrecisionRecallF-MeasureROCAreaClass
0.80.0530.80.80.80.958soft
0.750.10.60.750.6670.925hard
0.80.2220.8570.80.8280.896none
WeightedAvg.0.7920.1670.8020.7920.7950.914
===ConfusionMatrix===
abc<--classifiedas
401|a=soft
031|b=hard
1212|c=none
K=3;
===Summary===
CorrectlyClassifiedInstances1979.1667%
IncorrectlyClassifiedInstances520.8333%
Kappastatistic0.6262
Meanabsoluteerror0.2262
Rootmeansquarederror0.3165
Relativeabsoluteerror59.8856%
Rootrelativesquarederror72.4707%
TotalNumberofInstances24
===DetailedAccuracyByClass===
TPRateFPRatePrecisionRecallF-MeasureROCAreaClass
0.80.0530.80.80.80.958soft
0.750.10.60.750.6670.925hard
0.80.2220.8570.80.8280.896none
WeightedAvg.0.7920.1670.8020.7920.7950.914
===ConfusionMatrix===
abc<--classifiedas
401|a=soft
031|b=hard
1212|c=none
K=5;
===Summary===
CorrectlyClassifiedInstances1666.6667%
IncorrectlyClassifiedInstances833.3333%
Kappastatistic0.3356
Meanabsoluteerror0.2793
Rootmeansquarederror0.3624
Relativeabsoluteerror73.9227%
Rootrelativesquarederror82.9705%
TotalNumberofInstances24
===DetailedAccuracyByClass===
TPRateFPRatePrecisionRecallF-MeasureROCAreaClass
0.60.0530.750.60.6670.947soft
0.250.10.3330.250.2860.856hard
0.80.5560.7060.80.750.859none
WeightedAvg.0.6670.3750.6530.6670.6550.877
===ConfusionMatrix===
abc<--classifiedas
302|a=soft
013|b=hard
K=7;===Summary===
CorrectlyClassifiedInstances1458.3333%
IncorrectlyClassifiedInstances1041.6667%
Kappastatistic-0.0619
Meanabsoluteerror0.3188
Rootmeansquarederror0.387
Relativeabsoluteerror84.3959%
Rootrelativesquarederror88.61%
TotalNumberofInstances24
===DetailedAccuracyByClass===
TPRateFPRatePrecisionRecallF-MeasureROCAreaClass
00.0530000.947soft
000000.831hard
0.93310.6090.9330.7370.807none
WeightedAvg.0.5830.6360.380.5830.4610.841
===ConfusionMatrix===
abc<--classifiedas
005|a=soft
004|b=hard
1014|c=none
K=9;
===Summary===
CorrectlyClassifiedInstances1458.3333%
IncorrectlyClassifiedInstances1041.6667%
Kappastatistic-0.0619
Meanabsoluteerror0.3188
Rootmeansquarederror0.387
Relativeabsoluteerror84.3959%
Rootrelativesquarederror88.61%
TotalNumberofInstances24
===DetailedAccuracyByClass===
TPRateFPRatePrecisionRecallF-MeasureROCAreaClass
00.0530000.947soft
000000.831hard
0.93310.6090.9330.7370.807none
WeightedAvg.0.5830.6360.380.5830.4610.841
===ConfusionMatrix===
abc<--classifiedas
005|a=soft
004|b=hard
1014|c=none
可以看出第一个数据集在K=3时分类精度最高
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 数据 挖掘 实验 报告