Seke Blog: weka.classifiers.misc.VFI

weka.classifiers.misc.VFI 為屬性區間多數決(voting feature intervals)學習器，
記錄各屬性值/區間的類別分布，再累加案例落於各屬性區間之分布，取機率高者為預測類別，
可提供案例集基本表現值供標竿比較之用。

VFI 學習分類時，為每個屬性文字值/數字區間統計其類別分布。
遇數值屬性的數字區間切割法為找出每個類別的數值上界、下界，加上正負無窮大，
形成m=(類別數 x 2) + 2個切割端點，若類別上下界不重疊，最多m-1個區間。
然後記錄這m-1個區間的案例類別分布，案例屬性若剛好位於端點，則端點左右區間各得半個案例貢獻。

預測分類時，累加新案例落於各屬性區間的類別分布，取機率高者為其預測類別。

參數說明:
-C 關閉信心加權。累加各屬性的類別分布時，預設有啟動信心加權，可使用本參數關閉之。
-B <bias> 啟動信心加權時，加權的權重為類別分布資訊量(entropy)的-bias次方，bias預設值為0.6。
   屬性的類別分布資訊量介於0~log2(類別數)之間，值愈小，屬性的類別鑑別度愈高。
   bias介於0~1之間，bias為0表示維持原來分布之貢獻(相乘權重為1)，
   bias愈大則愈能突顯鑑別度高屬性在分布加總時的貢獻(相乘權重>1)。

> java -cp weka.jar;. weka.classifiers.misc.VFI  -t data\weather.numeric.arff

Voting feature intervals classifier

 outlook :
  sunny
    2.0    3.0
  overcast
    4.0    0.0
  rainy
    3.0    2.0

 temperature :
  -Infinity
    0.5    0.0
  64.0
    0.5    0.5
  65.0
    7.5    3.5
  83.0
    0.5    0.5
  85.0
    0.0    0.5
  Infinity


 humidity :
  -Infinity
    0.5    0.0
  65.0
    1.5    0.5
  70.0
    6.0    4.0
  95.0
    0.5    0.5
  96.0
    0.5    0.0
  Infinity


 windy :
  TRUE
    3.0    3.0
  FALSE
    6.0    2.0


Time taken to build model: 0 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances          12               85.7143 %
Incorrectly Classified Instances         2               14.2857 %
Kappa statistic                          0.7143
Mean absolute error                      0.3354
Root mean squared error                  0.3996
Relative absolute error                 72.2387 %
Root relative squared error             83.3373 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 7 2 | a = yes
 0 5 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           7               50      %
Incorrectly Classified Instances         7               50      %
Kappa statistic                         -0.0426
Mean absolute error                      0.4725
Root mean squared error                  0.5624
Relative absolute error                 99.2318 %
Root relative squared error            113.9897 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 5 4 | a = yes
 3 2 | b = no


如下 weather.numeric.arff 案例集的14個案例利用2個文字屬性及2個數字屬性，預測文字屬性。

 
 
 
 
 
 

  outlook
  temperature
  humidity
  windy
  play
 

  sunny
  85
  85
  FALSE
  no
 

  sunny
  80
  90
  TRUE
  no
 

  rainy
  65
  70
  TRUE
  no
 

  sunny
  72
  95
  FALSE
  no
 

  rainy
  71
  91
  TRUE
  no
 

  overcast
  83
  86
  FALSE
  yes
 

  rainy
  70
  96
  FALSE
  yes
 

  rainy
  68
  80
  FALSE
  yes
 

  overcast
  64
  65
  TRUE
  yes
 

  sunny
  69
  70
  FALSE
  yes
 

  rainy
  75
  80
  FALSE
  yes
 

  sunny
  75
  70
  TRUE
  yes
 

  overcast
  72
  90
  TRUE
  yes
 

  overcast
  81
  75
  FALSE
  yes
 

參考: weka.classifiers.misc.VFI
1. source code
2. documentation

Seke Blog

weka.classifiers.misc.VFI

沒有留言:

Building a Lightweight Streamlit Client for Local Ollama LLM Interaction

總網頁瀏覽量

outlook	temperature	humidity	windy	play
sunny	85	85	FALSE	no
sunny	80	90	TRUE	no
rainy	65	70	TRUE	no
sunny	72	95	FALSE	no
rainy	71	91	TRUE	no
overcast	83	86	FALSE	yes
rainy	70	96	FALSE	yes
rainy	68	80	FALSE	yes
overcast	64	65	TRUE	yes
sunny	69	70	FALSE	yes
rainy	75	80	FALSE	yes
sunny	75	70	TRUE	yes
overcast	72	90	TRUE	yes
overcast	81	75	FALSE	yes