使用J48来分析数据集
打开文件 glass,arff 检查可用的分类器 选择J48决策树学习器 查看正确分类的实例和the confusion matrix 打开glass.arff 检查可用的分类器
| | K <= 0.03 | | | Na <= 13.75: build wind non-float (3.0) | | | Na > 13.75: tableware (9.0) | | K > 0.03 | | | Na <= 13.49 | | | | RI <= 1.5241: containers (13.0/1.0) | | | | RI > 1.5241: build wind non-float (3.0) | | | Na > 13.49: build wind non-float (7.0/1.0) | Mg > 2.41 | | Al <= 1.41 | | | RI <= 1.51707 | | | | RI <= 1.51596: build wind float (3.0) | | | | RI > 1.51596 | | | | | Fe <= 0.12 | | | | | | Mg <= 3.54: vehic wind float (5.0) | | | | | | Mg > 3.54 | | | | | | | RI <= 1.51667: build wind non-float (2.0) | | | | | | | RI > 1.51667: vehic wind float (2.0) | | | | | Fe > 0.12: build wind non-float (2.0) | | | RI > 1.51707 | | | | K <= 0.23 | | | | | Mg <= 3.34: build wind non-float (2.0) | | | | | Mg > 3.34 | | | | | | Si <= 72.64 | | | | | | | Na <= 14.01: build wind float (14.0) | | | | | | | Na > 14.01 | | | | | | | | RI <= 1.52211 | | | | | | | | | Na <= 14.32: vehic wind float (3.0) | | | | | | | | | Na > 14.32: build wind float (2.0) | | | | | | | | RI > 1.52211: build wind float (3.0) | | | | | | Si > 72.64: vehic wind float (3.0) | | | | K > 0.23 | | | | | Mg <= 3.75 | | | | | | Fe <= 0.14 | | | | | | | RI <= 1.52043: build wind float (36.0) | | | | | | | RI > 1.52043: build wind non-float (2.0/1.0) | | | | | | Fe > 0.14 | | | | | | | Al <= 1.17: build wind non-float (5.0) | | | | | | | Al > 1.17: build wind float (6.0/1.0) | | | | | Mg > 3.75: build wind non-float (10.0) | | Al > 1.41 | | | Si <= 72.49 | | | | Ca <= 8.28: build wind non-float (6.0) | | | | Ca > 8.28: vehic wind float (5.0/1.0) | | | Si > 72.49 | | | | RI <= 1.51732 | | | | | Fe <= 0.22: build wind non-float (30.0/1.0) | | | | | Fe > 0.22 | | | | | | RI <= 1.51629: build wind float (2.0) | | | | | | RI > 1.51629: build wind non-float (2.0) | | | | RI > 1.51732 | | | | | RI <= 1.51789: build wind float (3.0) | | | | | RI > 1.51789: build wind non-float (2.0) Ba > 0.27 | Si <= 70.16: build wind non-float (2.0/1.0) | Si > 70.16: headlamps (27.0/1.0) Number of Leaves : 30 Size of the tree : 59 Time taken to build model: 0.01 seconds这个树之后再解释,我们可以看到这棵树有30个叶子节点,59个节点
Number of Leaves : 30 Size of the tree : 59
接下来我们可以看到总结//:
=== Summary === Correctly Classified Instances 143 66.8224 % Incorrectly Classified Instances 71 33.1776 % Kappa statistic 0.55 Mean absolute error 0.1026 Root mean squared error 0.2897 Relative absolute error 48.4507 % Root relative squared error 89.2727 % Total Number of Instances 214
可以准确率为66.8%。
在窗口的最下面是混淆矩阵
=== Confusion Matrix === a b c d e f g <-- classified as 50 15 3 0 0 1 1 | a = build wind float 16 47 6 0 2 3 2 | b = build wind non-float 5 5 6 0 0 1 0 | c = vehic wind float 0 0 0 0 0 0 0 | d = vehic wind non-float 0 2 0 0 10 0 1 | e = containers 1 1 0 0 0 7 0 | f = tableware 3 2 0 0 0 1 23 | g = headlamps
行坐标表示被分类的实例正确的类型,纵坐标表示分类的结果。
例如:我们有7种不同的玻璃
矩阵第一行也就是类型为a的玻璃,看到:a b c d e f g <-- classified as 50 15 3 0 0 1 1 | a = build wind float
分类器的分类结果50个实例的类别为a,是正确的分类,15个实例为b( build wind non-float)这是错误的分类。
这就是混淆矩阵,大部分实例都在对角线上,这是我们希望看到的结果(大部分都正确分类)。每个不在对角线上的实例都表示一个错误的分类结果。
配置分类器
这里不介绍所有的参数,我们以unpruned(pruned:剪枝)为例。
unpruned的默认值为false,也就以为着我们刚刚建立的是经过剪枝的决策树,我们可以把参数改为True重新运行。
我们可以看到这次得到的准确率为:
Correctly Classified Instances 144 67.2897 %
查看之前的运行数据:
| | Al <= 1.41 | | | RI <= 1.51727: vehic wind float (16.0/9.0) | | | RI > 1.51727 | | | | K <= 0.23: build wind float (27.0/8.0) | | | | K > 0.23 | | | | | Mg <= 3.66: build wind float (41.0/5.0) | | | | | Mg > 3.66: build wind non-float (16.0/3.0) | | Al > 1.41: build wind non-float (50.0/10.0) Ba > 0.27: headlamps (29.0/3.0) Number of Leaves : 8 Size of the tree : 15 Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 131 61.215 %现在我们得到了一个61%的准确率和一个更小的决策树。
可视化决策树
| | Al <= 1.41 | | | RI <= 1.51727: vehic wind float (16.0/9.0) | | | RI > 1.51727 | | | | K <= 0.23: build wind float (27.0/8.0) | | | | K > 0.23 | | | | | Mg <= 3.66: build wind float (41.0/5.0) | | | | | Mg > 3.66: build wind non-float (16.0/3.0) | | Al > 1.41: build wind non-float (50.0/10.0) Ba > 0.27: headlamps (29.0/3.0) Number of Leaves : 8 Size of the tree : 15在分类器配置面板当中的More,可以让你了解更多关于该分类器的信息,这对于你使用分类器是很有帮助的: