weka.classifiers.lazy.IB1 為簡單最近鄰居學習器,
訓練時只記錄原始案例,測試時挑選最相鄰1個案例,依案例類別值作預測。
可提供案例集不錯表現值供標竿比較之用。
IB1 計算兩案例距離時,遇文字屬性值相同視屬性距離為0,不同視為1;
遇數值屬性時,依原始案例集區間範圍作正規化,讓數值介於0,1之間,
再將兩正規化值相減,取平方,得到屬性距離;兩案例遇任一屬性缺值時,視該屬性距離為1。
最後視各屬性重要性相同,取所有屬性距離加總,再開根號(歐幾里德距離),當成兩案例距離。
挑距離最小者為參考案例,回傳其類別值為預測值。
> java -cp weka.jar;. weka.classifiers.lazy.IB1 -t data\weather.numeric.arff
IB1 classifier
Time taken to build model: 0.02 seconds
Time taken to test model on training data: 0 seconds
=== Error on training data ===
Correctly Classified Instances 14 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 %
Root relative squared error 0 %
Total Number of Instances 14
=== Confusion Matrix ===
a b <-- classified as
9 0 | a = yes
0 5 | b = no
=== Stratified cross-validation ===
Correctly Classified Instances 7 50 %
Incorrectly Classified Instances 7 50 %
Kappa statistic 0.0392
Mean absolute error 0.5
Root mean squared error 0.7071
Relative absolute error 105 %
Root relative squared error 143.3236 %
Total Number of Instances 14
=== Confusion Matrix ===
a b <-- classified as
4 5 | a = yes
2 3 | b = no
如下 weather.numeric.arff 案例集的14個案例利用2個文字屬性及2個數字屬性,預測文字屬性。
outlook | temperature | humidity | windy | play |
sunny | 85 | 85 | FALSE | no |
sunny | 80 | 90 | TRUE | no |
rainy | 65 | 70 | TRUE | no |
sunny | 72 | 95 | FALSE | no |
rainy | 71 | 91 | TRUE | no |
overcast | 83 | 86 | FALSE | yes |
rainy | 70 | 96 | FALSE | yes |
rainy | 68 | 80 | FALSE | yes |
overcast | 64 | 65 | TRUE | yes |
sunny | 69 | 70 | FALSE | yes |
rainy | 75 | 80 | FALSE | yes |
sunny | 75 | 70 | TRUE | yes |
overcast | 72 | 90 | TRUE | yes |
overcast | 81 | 75 | FALSE | yes |
參考:
1.weka.classifiers.lazy.IB1
code |
doc
沒有留言:
張貼留言