2015年10月10日 星期六

weka.classifiers.rules.OneR

weka.classifiers.rules.OneR 屬單節點決策樹(單屬性規則)學習器,
利用單屬性多數/區間多數決原理,提供案例集基本表現值供標竿比較之用。
任何學習器都應該比OneR基本表現更好才有存在價值。

OneR學習分類時,為每個屬性建立一顆單節點決策樹,最後留下錯誤率最低者。
預測時則只根據留下的單節點決策樹,依單一屬性值的多數/區間多數決作為預測類別。

參數說明:
-B 數值屬性的區間(bucket)切割參數,預設值6,
   表示任一區間要成立,其多數決類別必需擁有的最少案例數。
   此下限值愈低,愈容易出現小區間,遷就案例能力愈強。

> java -cp weka.jar;. weka.classifiers.rules.OneR  -t data\weather.numeric.arff

outlook:
        sunny   -> no
        overcast        -> yes
        rainy   -> yes
(10/14 instances correct)


Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances          10               71.4286 %
Incorrectly Classified Instances         4               28.5714 %
Kappa statistic                          0.3778
Mean absolute error                      0.2857
Root mean squared error                  0.5345
Relative absolute error                 61.5385 %
Root relative squared error            111.4773 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 7 2 | a = yes
 2 3 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           6               42.8571 %
Incorrectly Classified Instances         8               57.1429 %
Kappa statistic                         -0.2444
Mean absolute error                      0.5714
Root mean squared error                  0.7559
Relative absolute error                120      %
Root relative squared error            153.2194 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 5 4 | a = yes
 4 1 | b = no

如下weather.numeric.arff 案例集的4個屬性,以outlook所建單節點決策樹錯誤率最低
outlook temperature humidity windy play
overcast 83 86 FALSE yes
overcast 64 65 TRUE yes
overcast 72 90 TRUE yes
overcast 81 75 FALSE yes
rainy 65 70 TRUE no
rainy 71 91 TRUE no
rainy 70 96 FALSE yes
rainy 68 80 FALSE yes
rainy 75 80 FALSE yes
sunny 85 85 FALSE no
sunny 80 90 TRUE no
sunny 72 95 FALSE no
sunny 69 70 FALSE yes
sunny 75 70 TRUE yes

OneR 針對數值屬性提供區間(bucket)切割參數-B,預設值6,
表示任一區間要成立,其多數決類別必需擁有至少 6 個案例數。此下限值愈低,愈容易出現小區間,遷就案例能力愈強。

> java -cp weka.jar;. weka.classifiers.rules.OneR  -t data\weather.numeric.arff -B 1

Options: -B 1

temperature:
        < 64.5  -> yes
        < 66.5  -> no
        < 70.5  -> yes
        < 71.5  -> no
        < 77.5  -> yes
        < 80.5  -> no
        < 84.0  -> yes
        >= 84.0 -> no
(13/14 instances correct)


Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances          13               92.8571 %
Incorrectly Classified Instances         1                7.1429 %
Kappa statistic                          0.8372
Mean absolute error                      0.0714
Root mean squared error                  0.2673
Relative absolute error                 15.3846 %
Root relative squared error             55.7386 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 9 0 | a = yes
 1 4 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           5               35.7143 %
Incorrectly Classified Instances         9               64.2857 %
Kappa statistic                         -0.3404
Mean absolute error                      0.6429
Root mean squared error                  0.8018
Relative absolute error                135      %
Root relative squared error            162.5137 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 4 5 | a = yes
 4 1 | b = no


如下weather.numeric.arff 案例集的4個屬性,以temperature所建單節點決策樹錯誤率雖低,
只是遷就看過資料能力強,預測未見資料能力則弱。

outlook temperature humidity windy play
overcast 64 65 TRUE yes
rainy 65 70 TRUE no
rainy 68 80 FALSE yes
sunny 69 70 FALSE yes
rainy 70 96 FALSE yes
rainy 71 91 TRUE no
overcast 72 90 TRUE yes
sunny 72 95 FALSE no
rainy 75 80 FALSE yes
sunny 75 70 TRUE yes
sunny 80 90 TRUE no
overcast 81 75 FALSE yes
overcast 83 86 FALSE yes
sunny 85 85 FALSE no
參考: weka.classifiers.rules.OneR 1. source code 2. documentation

沒有留言: