2015年11月3日 星期二

weka.classifiers.functions.Winnow

weka.classifiers.functions.Winnow 屬錯誤驅動型學習器,
只處理文字屬性,將之轉成二元屬性,用來預測二元類別值。可線上累進學習。
適用在案例集屬性眾多,卻多數和預測不相關情況,可以快速鎖定相關屬性作預測。

給定案例屬性(a0, a1, ..., ak),門檻 theta,權重升級係數alpha,權重降級係數beta,
權重向量(w0, w1, ..., wk)或(w0+ - w0-, w1+ - w1-, ..., wk+ - wk-)
其中,所有符號皆為正數,擴充屬性 a0 恆為 1。則預測式有二: 

  不平衡版: 權重向量各維度只能正數
     w0 * a0 + w1 * a1 + ... + wk * ak > theta 表類別1; 否則類別2

  平衡版:  權重向量各維度允許負數
    (w0+ - w0-) * a0 + (w1+ - w1-) * a1 + ... + (wk+ - wk-) * ak > theta 表類別1; 否則類別2

學習過程若遇預測錯誤,則權重向量調整法如下:
  類別2誤為類別1:   w *= beta  或 w+ *= beta  and w- *= alpha 讓權重變小
  類別1誤為類別2:   w *= alpha 或 w+ *= alpha and w- *= beta  讓權重變大

參數說明:
 -L  使用平衡版。預設值false
 -I  套用訓練集學習權重的輪數。預設值1
 -A  權重升級係數alpha,需>1。預設值2.0
 -B  權重降級係數beta,需<1。預設值0.5
 -H  預測門檻theta。預設值-1,表示屬性個數
 -W  權重初始值,需>0。預設值2.0
 -S  亂數種子,影響訓練集的案例訓練順序。預設值1


> java  weka.classifiers.functions.Winnow  -t data\weather.nominal.arff


Winnow

Attribute weights

w0 8.0
w1 1.0
w2 2.0
w3 4.0
w4 2.0
w5 2.0
w6 1.0
w7 1.0

Cumulated mistake count: 7


Time taken to build model: 0 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances          10               71.4286 %
Incorrectly Classified Instances         4               28.5714 %
Kappa statistic                          0.3778
Mean absolute error                      0.2857
Root mean squared error                  0.5345
Relative absolute error                 61.5385 %
Root relative squared error            111.4773 %
Total Number of Instances               14     


=== Confusion Matrix ===

 a b   <-- classified as
 7 2 | a = yes
 2 3 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           7               50      %
Incorrectly Classified Instances         7               50      %
Kappa statistic                         -0.2564
Mean absolute error                      0.5   
Root mean squared error                  0.7071
Relative absolute error                105      %
Root relative squared error            143.3236 %
Total Number of Instances               14     


=== Confusion Matrix ===

 a b   <-- classified as
 7 2 | a = yes
 5 0 | b = no

如下 weather.nominal.arff 案例集的14個案例利用4個文字屬性,預測文字屬性。
outlook temperature humidity windy play
sunny hot high FALSE no
sunny hot high TRUE no
overcast hot high FALSE yes
rainy mild high FALSE yes
rainy cool normal FALSE yes
rainy cool normal TRUE no
overcast cool normal TRUE yes
sunny mild high FALSE no
sunny cool normal FALSE yes
rainy mild normal FALSE yes
sunny mild normal TRUE yes
overcast mild high TRUE yes
overcast hot normal FALSE yes
rainy mild high TRUE no
參考: 1.weka.classifiers.functions.Winnow code | doc

沒有留言: