Weka在数据挖掘中的运用 04 Buiding a classifier

使用J48来分析数据集

  • 打开文件 glass,arff
  • 检查可用的分类器
  • 选择J48决策树学习器
  • 查看正确分类的实例和the confusion matrix
  • 打开glass.arff 检查可用的分类器

    | | K <= 0.03 | | | Na <= 13.75: build wind non-float (3.0) | | | Na > 13.75: tableware (9.0) | | K > 0.03 | | | Na <= 13.49 | | | | RI <= 1.5241: containers (13.0/1.0) | | | | RI > 1.5241: build wind non-float (3.0) | | | Na > 13.49: build wind non-float (7.0/1.0) | Mg > 2.41 | | Al <= 1.41 | | | RI <= 1.51707 | | | | RI <= 1.51596: build wind float (3.0) | | | | RI > 1.51596 | | | | | Fe <= 0.12 | | | | | | Mg <= 3.54: vehic wind float (5.0) | | | | | | Mg > 3.54 | | | | | | | RI <= 1.51667: build wind non-float (2.0) | | | | | | | RI > 1.51667: vehic wind float (2.0) | | | | | Fe > 0.12: build wind non-float (2.0) | | | RI > 1.51707 | | | | K <= 0.23 | | | | | Mg <= 3.34: build wind non-float (2.0) | | | | | Mg > 3.34 | | | | | | Si <= 72.64 | | | | | | | Na <= 14.01: build wind float (14.0) | | | | | | | Na > 14.01 | | | | | | | | RI <= 1.52211 | | | | | | | | | Na <= 14.32: vehic wind float (3.0) | | | | | | | | | Na > 14.32: build wind float (2.0) | | | | | | | | RI > 1.52211: build wind float (3.0) | | | | | | Si > 72.64: vehic wind float (3.0) | | | | K > 0.23 | | | | | Mg <= 3.75 | | | | | | Fe <= 0.14 | | | | | | | RI <= 1.52043: build wind float (36.0) | | | | | | | RI > 1.52043: build wind non-float (2.0/1.0) | | | | | | Fe > 0.14 | | | | | | | Al <= 1.17: build wind non-float (5.0) | | | | | | | Al > 1.17: build wind float (6.0/1.0) | | | | | Mg > 3.75: build wind non-float (10.0) | | Al > 1.41 | | | Si <= 72.49 | | | | Ca <= 8.28: build wind non-float (6.0) | | | | Ca > 8.28: vehic wind float (5.0/1.0) | | | Si > 72.49 | | | | RI <= 1.51732 | | | | | Fe <= 0.22: build wind non-float (30.0/1.0) | | | | | Fe > 0.22 | | | | | | RI <= 1.51629: build wind float (2.0) | | | | | | RI > 1.51629: build wind non-float (2.0) | | | | RI > 1.51732 | | | | | RI <= 1.51789: build wind float (3.0) | | | | | RI > 1.51789: build wind non-float (2.0) Ba > 0.27 | Si <= 70.16: build wind non-float (2.0/1.0) | Si > 70.16: headlamps (27.0/1.0) Number of Leaves : 30 Size of the tree : 59 Time taken to build model: 0.01 seconds

    这个树之后再解释,我们可以看到这棵树有30个叶子节点,59个节点

    Number of Leaves  :     30
    Size of the tree :  59
    

    接下来我们可以看到总结//:

    === Summary ===
    Correctly Classified Instances         143               66.8224 %
    Incorrectly Classified Instances        71               33.1776 %
    Kappa statistic                          0.55  
    Mean absolute error                      0.1026
    Root mean squared error                  0.2897
    Relative absolute error                 48.4507 %
    Root relative squared error             89.2727 %
    Total Number of Instances              214     
    

    可以准确率为66.8%。

    在窗口的最下面是混淆矩阵

    === Confusion Matrix ===
      a  b  c  d  e  f  g   <-- classified as
     50 15  3  0  0  1  1 |  a = build wind float
     16 47  6  0  2  3  2 |  b = build wind non-float
      5  5  6  0  0  1  0 |  c = vehic wind float
      0  0  0  0  0  0  0 |  d = vehic wind non-float
      0  2  0  0 10  0  1 |  e = containers
      1  1  0  0  0  7  0 |  f = tableware
      3  2  0  0  0  1 23 |  g = headlamps
    

    行坐标表示被分类的实例正确的类型,纵坐标表示分类的结果。
    例如:我们有7种不同的玻璃
    矩阵第一行也就是类型为a的玻璃,看到:

      a  b  c  d  e  f  g   <-- classified as
     50 15  3  0  0  1  1 |  a = build wind float
    

    分类器的分类结果50个实例的类别为a,是正确的分类,15个实例为b( build wind non-float)这是错误的分类。

    这就是混淆矩阵,大部分实例都在对角线上,这是我们希望看到的结果(大部分都正确分类)。每个不在对角线上的实例都表示一个错误的分类结果。

    配置分类器

    这里不介绍所有的参数,我们以unpruned(pruned:剪枝)为例。
    unpruned的默认值为false,也就以为着我们刚刚建立的是经过剪枝的决策树,我们可以把参数改为True重新运行。
    我们可以看到这次得到的准确率为:

    Correctly Classified Instances         144               67.2897 %
    

    查看之前的运行数据:

    | | Al <= 1.41 | | | RI <= 1.51727: vehic wind float (16.0/9.0) | | | RI > 1.51727 | | | | K <= 0.23: build wind float (27.0/8.0) | | | | K > 0.23 | | | | | Mg <= 3.66: build wind float (41.0/5.0) | | | | | Mg > 3.66: build wind non-float (16.0/3.0) | | Al > 1.41: build wind non-float (50.0/10.0) Ba > 0.27: headlamps (29.0/3.0) Number of Leaves : 8 Size of the tree : 15 Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 131 61.215 %

    现在我们得到了一个61%的准确率和一个更小的决策树。

    可视化决策树

    | | Al <= 1.41 | | | RI <= 1.51727: vehic wind float (16.0/9.0) | | | RI > 1.51727 | | | | K <= 0.23: build wind float (27.0/8.0) | | | | K > 0.23 | | | | | Mg <= 3.66: build wind float (41.0/5.0) | | | | | Mg > 3.66: build wind non-float (16.0/3.0) | | Al > 1.41: build wind non-float (50.0/10.0) Ba > 0.27: headlamps (29.0/3.0) Number of Leaves : 8 Size of the tree : 15

    在分类器配置面板当中的More,可以让你了解更多关于该分类器的信息,这对于你使用分类器是很有帮助的: