weka.classifiers.rules.OneR 屬單節點決策樹(單屬性規則)學習器,
利用單屬性多數/區間多數決原理,提供案例集基本表現值供標竿比較之用。
任何學習器都應該比OneR基本表現更好才有存在價值。
OneR學習分類時,為每個屬性建立一顆單節點決策樹,最後留下錯誤率最低者。
預測時則只根據留下的單節點決策樹,依單一屬性值的多數/區間多數決作為預測類別。
參數說明:
-B 數值屬性的區間(bucket)切割參數,預設值6,
表示任一區間要成立,其多數決類別必需擁有的最少案例數。
此下限值愈低,愈容易出現小區間,遷就案例能力愈強。
> java -cp weka.jar;. weka.classifiers.rules.OneR -t data\weather.numeric.arff
outlook:
sunny -> no
overcast -> yes
rainy -> yes
(10/14 instances correct)
Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0 seconds
=== Error on training data ===
Correctly Classified Instances 10 71.4286 %
Incorrectly Classified Instances 4 28.5714 %
Kappa statistic 0.3778
Mean absolute error 0.2857
Root mean squared error 0.5345
Relative absolute error 61.5385 %
Root relative squared error 111.4773 %
Total Number of Instances 14
=== Confusion Matrix ===
a b <-- classified as
7 2 | a = yes
2 3 | b = no
=== Stratified cross-validation ===
Correctly Classified Instances 6 42.8571 %
Incorrectly Classified Instances 8 57.1429 %
Kappa statistic -0.2444
Mean absolute error 0.5714
Root mean squared error 0.7559
Relative absolute error 120 %
Root relative squared error 153.2194 %
Total Number of Instances 14
=== Confusion Matrix ===
a b <-- classified as
5 4 | a = yes
4 1 | b = no
如下weather.numeric.arff 案例集的4個屬性,以outlook所建單節點決策樹錯誤率最低
outlook |
temperature |
humidity |
windy |
play |
overcast |
83 |
86 |
FALSE |
yes |
overcast |
64 |
65 |
TRUE |
yes |
overcast |
72 |
90 |
TRUE |
yes |
overcast |
81 |
75 |
FALSE |
yes |
rainy |
65 |
70 |
TRUE |
no |
rainy |
71 |
91 |
TRUE |
no |
rainy |
70 |
96 |
FALSE |
yes |
rainy |
68 |
80 |
FALSE |
yes |
rainy |
75 |
80 |
FALSE |
yes |
sunny |
85 |
85 |
FALSE |
no |
sunny |
80 |
90 |
TRUE |
no |
sunny |
72 |
95 |
FALSE |
no |
sunny |
69 |
70 |
FALSE |
yes |
sunny |
75 |
70 |
TRUE |
yes |
OneR 針對數值屬性提供區間(bucket)切割參數-B,預設值6,
表示任一區間要成立,其多數決類別必需擁有至少 6 個案例數。此下限值愈低,愈容易出現小區間,遷就案例能力愈強。
> java -cp weka.jar;. weka.classifiers.rules.OneR -t data\weather.numeric.arff -B 1
Options: -B 1
temperature:
< 64.5 -> yes
< 66.5 -> no
< 70.5 -> yes
< 71.5 -> no
< 77.5 -> yes
< 80.5 -> no
< 84.0 -> yes
>= 84.0 -> no
(13/14 instances correct)
Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0 seconds
=== Error on training data ===
Correctly Classified Instances 13 92.8571 %
Incorrectly Classified Instances 1 7.1429 %
Kappa statistic 0.8372
Mean absolute error 0.0714
Root mean squared error 0.2673
Relative absolute error 15.3846 %
Root relative squared error 55.7386 %
Total Number of Instances 14
=== Confusion Matrix ===
a b <-- classified as
9 0 | a = yes
1 4 | b = no
=== Stratified cross-validation ===
Correctly Classified Instances 5 35.7143 %
Incorrectly Classified Instances 9 64.2857 %
Kappa statistic -0.3404
Mean absolute error 0.6429
Root mean squared error 0.8018
Relative absolute error 135 %
Root relative squared error 162.5137 %
Total Number of Instances 14
=== Confusion Matrix ===
a b <-- classified as
4 5 | a = yes
4 1 | b = no
如下weather.numeric.arff 案例集的4個屬性,以temperature所建單節點決策樹錯誤率雖低,
只是遷就看過資料能力強,預測未見資料能力則弱。
outlook |
temperature |
humidity |
windy |
play |
overcast |
64 |
65 |
TRUE |
yes |
rainy |
65 |
70 |
TRUE |
no |
rainy |
68 |
80 |
FALSE |
yes |
sunny |
69 |
70 |
FALSE |
yes |
rainy |
70 |
96 |
FALSE |
yes |
rainy |
71 |
91 |
TRUE |
no |
overcast |
72 |
90 |
TRUE |
yes |
sunny |
72 |
95 |
FALSE |
no |
rainy |
75 |
80 |
FALSE |
yes |
sunny |
75 |
70 |
TRUE |
yes |
sunny |
80 |
90 |
TRUE |
no |
overcast |
81 |
75 |
FALSE |
yes |
overcast |
83 |
86 |
FALSE |
yes |
sunny |
85 |
85 |
FALSE |
no |
參考: weka.classifiers.rules.OneR
1.
source code
2.
documentation
沒有留言:
張貼留言