2015年10月14日 星期三


weka.classifiers.trees.Id3 為簡單決策樹學習器,

Id3 學習決策樹時,由樹根往樹葉,每一層皆挑選適當屬性作節點判斷,讓分叉後的類別分布變得更純。
衡量屬性的類別變純能力,則計算屬性的增量(gain)=套用屬性前的類別分布資訊量 - 套用屬性後的綜合類別分布資訊量。
但為避免過度擬合,挑分叉太多屬性,實際用增量比(gain ratio)=增量/分叉固有資訊量,作屬性挑選依據。

> java -cp weka.jar;. weka.classifiers.trees.Id3  -t data\weather.nominal.arff


outlook = sunny
|  humidity = high: no
|  humidity = normal: yes
outlook = overcast: yes
outlook = rainy
|  windy = TRUE: no
|  windy = FALSE: yes

Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances          14              100      %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1
Mean absolute error                      0
Root mean squared error                  0
Relative absolute error                  0      %
Root relative squared error              0      %
Total Number of Instances               14

=== Confusion Matrix ===

 a b   <-- classified as
 9 0 | a = yes
 0 5 | b = no

=== Stratified cross-validation ===

Correctly Classified Instances          12               85.7143 %
Incorrectly Classified Instances         2               14.2857 %
Kappa statistic                          0.6889
Mean absolute error                      0.1429
Root mean squared error                  0.378
Relative absolute error                 30      %
Root relative squared error             76.6097 %
Total Number of Instances               14

=== Confusion Matrix ===

 a b   <-- classified as
 8 1 | a = yes
 1 4 | b = no

如下 weather.nominal.arff 案例集的14個案例利用4個文字屬性,預測文字屬性。
outlook temperature humidity windy play
sunny hot high FALSE no
sunny hot high TRUE no
overcast hot high FALSE yes
rainy mild high FALSE yes
rainy cool normal FALSE yes
rainy cool normal TRUE no
overcast cool normal TRUE yes
sunny mild high FALSE no
sunny cool normal FALSE yes
rainy mild normal FALSE yes
sunny mild normal TRUE yes
overcast mild high TRUE yes
overcast hot normal FALSE yes
rainy mild high TRUE no
參考: 1.weka.classifiers.trees.Id3 code | doc
