Seke Blog: 10月 2015

weka.classifiers.lazy.IB1

weka.classifiers.lazy.IB1 為簡單最近鄰居學習器，
訓練時只記錄原始案例，測試時挑選最相鄰1個案例，依案例類別值作預測。
可提供案例集不錯表現值供標竿比較之用。

IB1 計算兩案例距離時，遇文字屬性值相同視屬性距離為0，不同視為1；
遇數值屬性時，依原始案例集區間範圍作正規化，讓數值介於0,1之間，
再將兩正規化值相減，取平方，得到屬性距離；兩案例遇任一屬性缺值時，視該屬性距離為1。
最後視各屬性重要性相同，取所有屬性距離加總，再開根號(歐幾里德距離)，當成兩案例距離。
挑距離最小者為參考案例，回傳其類別值為預測值。

> java -cp weka.jar;. weka.classifiers.lazy.IB1  -t data\weather.numeric.arff

IB1 classifier

Time taken to build model: 0.02 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances          14              100      %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1     
Mean absolute error                      0     
Root mean squared error                  0     
Relative absolute error                  0      %
Root relative squared error              0      %
Total Number of Instances               14     


=== Confusion Matrix ===

 a b   <-- classified as
 9 0 | a = yes
 0 5 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           7               50      %
Incorrectly Classified Instances         7               50      %
Kappa statistic                          0.0392
Mean absolute error                      0.5   
Root mean squared error                  0.7071
Relative absolute error                105      %
Root relative squared error            143.3236 %
Total Number of Instances               14     


=== Confusion Matrix ===

 a b   <-- classified as
 4 5 | a = yes
 2 3 | b = no

如下 weather.numeric.arff 案例集的14個案例利用2個文字屬性及2個數字屬性，預測文字屬性。

outlook temperature humidity windy play
sunny 85 85 FALSE no
sunny 80 90 TRUE no
rainy 65 70 TRUE no
sunny 72 95 FALSE no
rainy 71 91 TRUE no
overcast 83 86 FALSE yes
rainy 70 96 FALSE yes
rainy 68 80 FALSE yes
overcast 64 65 TRUE yes
sunny 69 70 FALSE yes
rainy 75 80 FALSE yes
sunny 75 70 TRUE yes
overcast 72 90 TRUE yes
overcast 81 75 FALSE yes


參考:
1.weka.classifiers.lazy.IB1
   code | doc

weka.classifiers.rules.Prism

weka.classifiers.rules.Prism 為簡單規則集學習器，
訓練時逐一就每一類別之案例用覆蓋法建立精確率較高之規則，
測試時由規則集上往下，依案例屬性值找尋第1個符合規則作預測。
可提供案例集不錯表現值供標竿比較之用。

Prism 學習規則集時，依各類別分別學習適合該類別所有案例之規則集，作法如下:
為每一類別建立覆蓋該類別的規則集前，先將所有案例置於待學習案例集E中，
只要集合E尚存有該類別案例，就表示規則還待添加。
   建立規則時，先從1個屬性條件，窮舉所有屬性配所有值的可能組合，取精確率最大組合；
       再添下1個屬性條件，同樣窮舉所有屬性配所有值的可能組合，取精確率最大組合；
       以此類推，直到添加屬性用光或已完全正確為止。
       取精確率最大組合時，若有持平的屬性條件，則取覆蓋率(分母)較大者。
   規則建好後，將該類別規則已預測正確的案例從集E中刪除，
   針對尚未覆蓋案例學習下一條規則。

Prism 在學習類別規則時有敵情觀念(其他類別案例全部都在)，
所以預測時，同類別的規則哪一條誰先檢查效果都一樣，不會誤含到其他類別的案例空間。


> java -cp weka.jar;. weka.classifiers.rules.Prism  -t data\weather.nominal.arff


Prism rules
----------
If outlook = overcast then yes
If humidity = normal
   and windy = FALSE then yes
If temperature = mild
   and humidity = normal then yes
If outlook = rainy
   and windy = FALSE then yes
If outlook = sunny
   and humidity = high then no
If outlook = rainy
   and windy = TRUE then no


Time taken to build model: 0 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances          14              100      %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1     
Mean absolute error                      0     
Root mean squared error                  0     
Relative absolute error                  0      %
Root relative squared error              0      %
Total Number of Instances               14     


=== Confusion Matrix ===

 a b   <-- classified as
 9 0 | a = yes
 0 5 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           9               64.2857 %
Incorrectly Classified Instances         3               21.4286 %
Kappa statistic                          0.4375
Mean absolute error                      0.25  
Root mean squared error                  0.5   
Relative absolute error                 59.2264 %
Root relative squared error            105.9121 %
UnClassified Instances                   2               14.2857 %
Total Number of Instances               14     


=== Confusion Matrix ===

 a b   <-- classified as
 7 0 | a = yes
 3 2 | b = no


如下 weather.nominal.arff 案例集的14個案例有9個yes、5個no。

 
 
 
 
 
 

  
 
 
 
 
 

  
 
 

  outlook
  temperature
  humidity
  windy
  play
 

  sunny
  hot
  high
  FALSE
  no
 

  sunny
  hot
  high
  TRUE
  no
 

  rainy
  cool
  normal
  TRUE
  no
 

  sunny
  mild
  high
  FALSE
  no
 

  rainy
  mild
  high
  TRUE
  no
 

  overcast
  hot
  high
  FALSE
  yes
 

  rainy
  mild
  high
  FALSE
  yes
 

  rainy
  cool
  normal
  FALSE
  yes
 

  overcast
  cool
  normal
  TRUE
  yes
 

  sunny
  cool
  normal
  FALSE
  yes
 

  rainy
  mild
  normal
  FALSE
  yes
 

  sunny
  mild
  normal
  TRUE
  yes
 

  overcast
  mild
  high
  TRUE
  yes
 

  overcast
  hot
  normal
  FALSE
  yes
 





參考:
1.weka.classifiers.rules.Prism
   code | doc

Let death be what takes us, not lack of imagination.

What Really Matters at the End of Life by J.B. Miller on TED

如上【臨終要事為何】演說中，安寧療護醫師 Miller 最後提到：

If we love such moments ferociously, then maybe we can learn to live well
-- not in spite of death, but because of it.
Let death be what takes us, not lack of imagination.

大意是說如果我們強烈喜愛大冷天冰雪在手中消融的瞬間感受，就可能學會如何臨終活得很好。
因為心態上不再是抗拒死亡，而是接受死亡。因為死亡在前，所以珍視每秒活著的感受。
講者希望死亡是可以帶引我們前進，珍視感受、活潑活著的原動力，而非槁木死灰、毫無想像空間的東西。

其中,最後一句的take有點像文言文，一字多義，很難理解其真正意思。
若參考Ted翻譯，似乎解釋作take/carry/move/drive/lead/guide somebody somewhere比較符合前文。
整句的幾種譯法如下:
希望是死亡在帶領我們,而非貧瘠的想像力
Hope that it is death that guides us, not the scanty imagination (guiding us).

讓死亡成為可以引領我們，而非不去想像的東西. by Allen Kuo & Sharon Loh
Let death become something which can guide us, instead of something which we don't need to imagine.

讓死亡是帶引我們前進的東西,別讓死亡是沒有想像的東西.
Let death be something which takes us (forward).
Don't let death be lack of imagination (or something which allows no imagination).

weka.classifiers.trees.Id3

weka.classifiers.trees.Id3 為簡單決策樹學習器，
訓練時建好以屬性為判斷節點的決策樹，測試時依屬性值決定案例的流向，遇到樹葉時，案例將歸屬走到該處之多數決類別。
可提供案例集不錯表現值供標竿比較之用。

Id3 學習決策樹時，由樹根往樹葉，每一層皆挑選適當屬性作節點判斷，讓分叉後的類別分布變得更純。
衡量類別分布變純程度，乃計算類別分布的資訊量(entropy)，資訊量愈小表示分布愈純，單一類別案例出現愈多。
衡量屬性的類別變純能力，則計算屬性的增量(gain)=套用屬性前的類別分布資訊量 - 套用屬性後的綜合類別分布資訊量。
綜合類別分布資訊量依各分叉案例數作加權匯整。屬性的增量愈大，變純能力愈好。
但為避免過度擬合，挑分叉太多屬性，實際用增量比(gain ratio)=增量/分叉固有資訊量，作屬性挑選依據。
屬性分叉愈多，分叉固有資訊量愈大。只有增量愈大，同時分叉固有資訊量不能太大的屬性，其增量比才會愈大，成為新節點屬性。
建樹過程一直持續到剩餘案例的類別分布只剩純一類，或任一屬性的增量皆負，無法靠添加新屬性節點讓純度提升為止。

> java -cp weka.jar;. weka.classifiers.trees.Id3  -t data\weather.nominal.arff

Id3


outlook = sunny
|  humidity = high: no
|  humidity = normal: yes
outlook = overcast: yes
outlook = rainy
|  windy = TRUE: no
|  windy = FALSE: yes


Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances          14              100      %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1
Mean absolute error                      0
Root mean squared error                  0
Relative absolute error                  0      %
Root relative squared error              0      %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 9 0 | a = yes
 0 5 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances          12               85.7143 %
Incorrectly Classified Instances         2               14.2857 %
Kappa statistic                          0.6889
Mean absolute error                      0.1429
Root mean squared error                  0.378
Relative absolute error                 30      %
Root relative squared error             76.6097 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 8 1 | a = yes
 1 4 | b = no


如下 weather.nominal.arff 案例集的14個案例利用4個文字屬性，預測文字屬性。

 
 
 
 
 
 

  
 
 
 
 
 

  outlook
  temperature
  humidity
  windy
  play
 

  sunny
  hot
  high
  FALSE
  no
 

  sunny
  hot
  high
  TRUE
  no
 

  overcast
  hot
  high
  FALSE
  yes
 

  rainy
  mild
  high
  FALSE
  yes
 

  rainy
  cool
  normal
  FALSE
  yes
 

  rainy
  cool
  normal
  TRUE
  no
 

  overcast
  cool
  normal
  TRUE
  yes
 

  sunny
  mild
  high
  FALSE
  no
 

  sunny
  cool
  normal
  FALSE
  yes
 

  rainy
  mild
  normal
  FALSE
  yes
 

  sunny
  mild
  normal
  TRUE
  yes
 

  overcast
  mild
  high
  TRUE
  yes
 

  overcast
  hot
  normal
  FALSE
  yes
 

  rainy
  mild
  high
  TRUE
  no
 








參考:
1.weka.classifiers.trees.Id3
   code | doc

weka.classifiers.bayes.NaiveBayes

weka.classifiers.bayes.NaiveBayes 為簡單貝氏機率式學習器，
記錄各類別事前機率，及給定類別下各屬性值出現之條件機率，
再依案例，累乘得到給定屬性值下各類別出現之事後機率，取機率高者為預測類別，
可提供案例集不錯表現值供標竿比較之用。

NaiveBayes 學習分類時，為每個類別統計其類別事前機率(prior probability)、給定類別下各屬性值出現之條件機率。
遇數值屬性時，預設母體為常態分布，統計其平均值、標準差，供條件機率之推估。
預測分類時，依新案例，累乘得到給定屬性值下各類別出現之事後機率(posterior probability)，取機率高者為預測類別。

列印模型時，屬性各類別的加權和(weight sum)在案例權重為1下，等於案例個數。
屬性精確度(precision)=屬性相鄰數值差之總和(deltaSum)/相異值(distinct)個數，
凡間隔小於精確度之值將視為同一個值，供估算母體分布參數之用。

參數說明:
-K 數值屬性不用常態分布，改用核心密度推估器(kernel density estimator)推算條件機率
-D 數值屬性利用監督式離散化(supervised disretization)方法，視為多個區間文字值
-O 模型改用舊格式顯示，適用類別眾多場合

> java -cp weka.jar;. weka.classifiers.bayes.NaiveBayes  -t data\weather.numeric.arff

Naive Bayes Classifier

                 Class
Attribute          yes      no
                (0.63)  (0.38)
===============================
outlook
  sunny             3.0     4.0
  overcast          5.0     1.0
  rainy             4.0     3.0
  [total]          12.0     8.0

temperature
  mean          72.9697 74.8364
  std. dev.      5.2304   7.384
  weight sum          9       5
  precision      1.9091  1.9091

humidity
  mean          78.8395 86.1111
  std. dev.      9.8023  9.2424
  weight sum          9       5
  precision      3.4444  3.4444

windy
  TRUE              4.0     4.0
  FALSE             7.0     3.0
  [total]          11.0     7.0


Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0.01 seconds

=== Error on training data ===

Correctly Classified Instances          13               92.8571 %
Incorrectly Classified Instances         1                7.1429 %
Kappa statistic                          0.8372
Mean absolute error                      0.2798
Root mean squared error                  0.3315
Relative absolute error                 60.2576 %
Root relative squared error             69.1352 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 9 0 | a = yes
 1 4 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           9               64.2857 %
Incorrectly Classified Instances         5               35.7143 %
Kappa statistic                          0.1026
Mean absolute error                      0.4649
Root mean squared error                  0.543
Relative absolute error                 97.6254 %
Root relative squared error            110.051  %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 8 1 | a = yes
 4 1 | b = no


如下 weather.numeric.arff 案例集的14個案例利用2個文字屬性及2個數字屬性，預測文字屬性。

 
 
 
 
 
 

  outlook
  temperature
  humidity
  windy
  play
 

  sunny
  85
  85
  FALSE
  no
 

  sunny
  80
  90
  TRUE
  no
 

  rainy
  65
  70
  TRUE
  no
 

  sunny
  72
  95
  FALSE
  no
 

  rainy
  71
  91
  TRUE
  no
 

  overcast
  83
  86
  FALSE
  yes
 

  rainy
  70
  96
  FALSE
  yes
 

  rainy
  68
  80
  FALSE
  yes
 

  overcast
  64
  65
  TRUE
  yes
 

  sunny
  69
  70
  FALSE
  yes
 

  rainy
  75
  80
  FALSE
  yes
 

  sunny
  75
  70
  TRUE
  yes
 

  overcast
  72
  90
  TRUE
  yes
 

  overcast
  81
  75
  FALSE
  yes
 


參考:
1.weka.classifiers.bayes.NaiveBayes
   code | doc

2.weka.estimators.NormalEstimator
   code | doc

3.weka.estimators.KernelEstimator
    code | doc

4.weka.filters.supervised.attribute.Discretize
    code | doc

weka.classifiers.misc.VFI

weka.classifiers.misc.VFI 為屬性區間多數決(voting feature intervals)學習器，
記錄各屬性值/區間的類別分布，再累加案例落於各屬性區間之分布，取機率高者為預測類別，
可提供案例集基本表現值供標竿比較之用。

VFI 學習分類時，為每個屬性文字值/數字區間統計其類別分布。
遇數值屬性的數字區間切割法為找出每個類別的數值上界、下界，加上正負無窮大，
形成m=(類別數 x 2) + 2個切割端點，若類別上下界不重疊，最多m-1個區間。
然後記錄這m-1個區間的案例類別分布，案例屬性若剛好位於端點，則端點左右區間各得半個案例貢獻。

預測分類時，累加新案例落於各屬性區間的類別分布，取機率高者為其預測類別。

參數說明:
-C 關閉信心加權。累加各屬性的類別分布時，預設有啟動信心加權，可使用本參數關閉之。
-B <bias> 啟動信心加權時，加權的權重為類別分布資訊量(entropy)的-bias次方，bias預設值為0.6。
   屬性的類別分布資訊量介於0~log2(類別數)之間，值愈小，屬性的類別鑑別度愈高。
   bias介於0~1之間，bias為0表示維持原來分布之貢獻(相乘權重為1)，
   bias愈大則愈能突顯鑑別度高屬性在分布加總時的貢獻(相乘權重>1)。

> java -cp weka.jar;. weka.classifiers.misc.VFI  -t data\weather.numeric.arff

Voting feature intervals classifier

 outlook :
  sunny
    2.0    3.0
  overcast
    4.0    0.0
  rainy
    3.0    2.0

 temperature :
  -Infinity
    0.5    0.0
  64.0
    0.5    0.5
  65.0
    7.5    3.5
  83.0
    0.5    0.5
  85.0
    0.0    0.5
  Infinity


 humidity :
  -Infinity
    0.5    0.0
  65.0
    1.5    0.5
  70.0
    6.0    4.0
  95.0
    0.5    0.5
  96.0
    0.5    0.0
  Infinity


 windy :
  TRUE
    3.0    3.0
  FALSE
    6.0    2.0


Time taken to build model: 0 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances          12               85.7143 %
Incorrectly Classified Instances         2               14.2857 %
Kappa statistic                          0.7143
Mean absolute error                      0.3354
Root mean squared error                  0.3996
Relative absolute error                 72.2387 %
Root relative squared error             83.3373 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 7 2 | a = yes
 0 5 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           7               50      %
Incorrectly Classified Instances         7               50      %
Kappa statistic                         -0.0426
Mean absolute error                      0.4725
Root mean squared error                  0.5624
Relative absolute error                 99.2318 %
Root relative squared error            113.9897 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 5 4 | a = yes
 3 2 | b = no


如下 weather.numeric.arff 案例集的14個案例利用2個文字屬性及2個數字屬性，預測文字屬性。

 
 
 
 
 
 

  outlook
  temperature
  humidity
  windy
  play
 

  sunny
  85
  85
  FALSE
  no
 

  sunny
  80
  90
  TRUE
  no
 

  rainy
  65
  70
  TRUE
  no
 

  sunny
  72
  95
  FALSE
  no
 

  rainy
  71
  91
  TRUE
  no
 

  overcast
  83
  86
  FALSE
  yes
 

  rainy
  70
  96
  FALSE
  yes
 

  rainy
  68
  80
  FALSE
  yes
 

  overcast
  64
  65
  TRUE
  yes
 

  sunny
  69
  70
  FALSE
  yes
 

  rainy
  75
  80
  FALSE
  yes
 

  sunny
  75
  70
  TRUE
  yes
 

  overcast
  72
  90
  TRUE
  yes
 

  overcast
  81
  75
  FALSE
  yes
 

參考: weka.classifiers.misc.VFI
1. source code
2. documentation

windows mklink vs unix link

在檔案系統中，常有不同路徑指向相同檔案物件(包含檔案或目錄)之需要。
Windows 檔案系統過去提供捷徑檔(.lnk)，供檔案總案或部份應用程式取用不同路徑之檔案物件。
Windows Vista 之後開始模仿 Unix 提供符號連結，允許檔案系統層級提供如下四種連結。
   SYMLINK, SYMLINKD, JUNCTION, HardLink

0.捷徑連結: 以特殊捷徑檔(.lnk)供特殊有支援應用程式取用，由client application解析
   DOS DIR 顯示 .lnk 副檔名
   [註] 此 .lnk 於網芳分享他機時，他機可能無法使用
   [註] del .lnk 可刪除捷徑，連結物件仍在

1.檔案符號連結: 預設安全原則需管理權限，連結可以跨切割，由client filesystem解析
   Windows指令: mklink   file_soft_link   file
   Unix指令:    link  -s  file_soft_link   file
   DOS DIR 顯示 <SYMLINK>
   [註] 此 file_soft_link 於網芳分享他機時，他機可能無法使用
   [註] del file_soft_link 可刪除符號連結，連結檔案仍在

2.目錄符號連結: 預設安全原則需管理權限，連結可以跨切割，由client filesystem解析
   Windows指令: mklink  /d  dir_soft_link   dir
   Unix指令:    link  -s  dir_soft_link   dir
   DOS DIR 顯示 <SYMLINKD>
   [註] 此 dir_soft_link 於網芳分享他機時，他機可能無法使用
   [註] rmdir dir_soft_link 可刪除符號連結，連結目錄仍在
   [註] del dir_soft_link 會詢問是否刪除目錄所有內容

3.目錄連結: 不需權限，連結限定本機任意切割，由server filesystem解析
   Windows指令: mklink  /j  dir_hard_link   dir
   Unix指令:    無類似 unix 指令
   DOS DIR 顯示 <JUNCTION>
   [註] 此 dir_hard_link 於網芳分享他機時，他機仍可使用
   [註] rmdir dir_hard_link 可刪除連結，連結目錄仍在
   [註] del dir_hard_link 會詢問是否刪除目錄所有內容

4.檔案連結: 不需權限，連結限定本機本切割，由server filesystem解析
   Windows指令: mklink  /h  file_hard_link  file
   Unix指令:    link  file_hard_link  file
   DOS DIR 顯示 等同普通檔案，無任何標示
   [註] 此 file_hard_link 於網芳分享他機時，他機仍可使用
   [註] del file_hard_link 可刪除連結，若連結檔案已無其他連結，檔案將真正刪除
           
註: 預設安全原則之下 mklink, mklink/d 兩個建立符號連結指令需管理權限，要以系統管理員執行DOS視窗，才能使用。

weka.classifiers.misc.HyperPipes

weka.classifiers.misc.HyperPipes 屬類別屬性區間學習器，
記錄符合類別的屬性出現區間，再預測屬性符合比例高之類別，可提供案例集基本表現值供標竿比較之用。

HyperPipes 學習分類時，為每個類別建立一個超區間(hyperpipe)，記錄每個屬性有出現該類別的案例區間為何。
預測分類時，計算新案例符合各類別的超區間程度，取符合程度高者為其預測類別。
案例符合某類別超區間程度(0~1)乃案例有多少比例(介於0~100%)的屬性落於某類別超區間的屬性描述區間內。

> java -cp weka.jar;. weka.classifiers.misc.HyperPipes  -t data\weather.numeric.arff

HyperPipes classifier
HyperPipe for class: yes
  temperature: 64.0,83.0,
  humidity: 65.0,96.0,
  outlook: true,true,true,
  windy: true,true,

HyperPipe for class: no
  temperature: 65.0,85.0,
  humidity: 70.0,95.0,
  outlook: true,false,true,
  windy: true,true,


Time taken to build model: 0 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances          10               71.4286 %
Incorrectly Classified Instances         4               28.5714 %
Kappa statistic                          0.2432
Mean absolute error                      0.4531
Root mean squared error                  0.4597
Relative absolute error                 97.5824 %
Root relative squared error             95.8699 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 9 0 | a = yes
 4 1 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           9               64.2857 %
Incorrectly Classified Instances         5               35.7143 %
Kappa statistic                          0
Mean absolute error                      0.483
Root mean squared error                  0.4899
Relative absolute error                101.4286 %
Root relative squared error             99.3055 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 9 0 | a = yes
 5 0 | b = no


如下 weather.numeric.arff 案例集的14個案例有9個yes，5個no。

 
 
 
 
 
 

  outlook
  temperature
  humidity
  windy
  play
 

  sunny
  85
  85
  FALSE
  no
 

  sunny
  80
  90
  TRUE
  no
 

  rainy
  65
  70
  TRUE
  no
 

  sunny
  72
  95
  FALSE
  no
 

  rainy
  71
  91
  TRUE
  no
 

  overcast
  83
  86
  FALSE
  yes
 

  rainy
  70
  96
  FALSE
  yes
 

  rainy
  68
  80
  FALSE
  yes
 

  overcast
  64
  65
  TRUE
  yes
 

  sunny
  69
  70
  FALSE
  yes
 

  rainy
  75
  80
  FALSE
  yes
 

  sunny
  75
  70
  TRUE
  yes
 

  overcast
  72
  90
  TRUE
  yes
 

  overcast
  81
  75
  FALSE
  yes
 


參考: weka.classifiers.misc.HyperPipes
1. source code
2. documentation

weka.classifiers.rules.OneR

weka.classifiers.rules.OneR 屬單節點決策樹(單屬性規則)學習器，
利用單屬性多數/區間多數決原理，提供案例集基本表現值供標竿比較之用。
任何學習器都應該比OneR基本表現更好才有存在價值。

OneR學習分類時，為每個屬性建立一顆單節點決策樹，最後留下錯誤率最低者。
預測時則只根據留下的單節點決策樹，依單一屬性值的多數/區間多數決作為預測類別。

參數說明:
-B 數值屬性的區間(bucket)切割參數，預設值6，
   表示任一區間要成立，其多數決類別必需擁有的最少案例數。
   此下限值愈低，愈容易出現小區間，遷就案例能力愈強。

> java -cp weka.jar;. weka.classifiers.rules.OneR  -t data\weather.numeric.arff

outlook:
        sunny   -> no
        overcast        -> yes
        rainy   -> yes
(10/14 instances correct)


Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances          10               71.4286 %
Incorrectly Classified Instances         4               28.5714 %
Kappa statistic                          0.3778
Mean absolute error                      0.2857
Root mean squared error                  0.5345
Relative absolute error                 61.5385 %
Root relative squared error            111.4773 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 7 2 | a = yes
 2 3 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           6               42.8571 %
Incorrectly Classified Instances         8               57.1429 %
Kappa statistic                         -0.2444
Mean absolute error                      0.5714
Root mean squared error                  0.7559
Relative absolute error                120      %
Root relative squared error            153.2194 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 5 4 | a = yes
 4 1 | b = no

如下weather.numeric.arff 案例集的4個屬性，以outlook所建單節點決策樹錯誤率最低

outlook	temperature	humidity	windy	play
overcast	83	86	FALSE	yes
overcast	64	65	TRUE	yes
overcast	72	90	TRUE	yes
overcast	81	75	FALSE	yes
rainy	65	70	TRUE	no
rainy	71	91	TRUE	no
rainy	70	96	FALSE	yes
rainy	68	80	FALSE	yes
rainy	75	80	FALSE	yes
sunny	85	85	FALSE	no
sunny	80	90	TRUE	no
sunny	72	95	FALSE	no
sunny	69	70	FALSE	yes
sunny	75	70	TRUE	yes

OneR 針對數值屬性提供區間(bucket)切割參數-B，預設值6，
表示任一區間要成立，其多數決類別必需擁有至少 6 個案例數。此下限值愈低，愈容易出現小區間，遷就案例能力愈強。

> java -cp weka.jar;. weka.classifiers.rules.OneR  -t data\weather.numeric.arff -B 1

Options: -B 1

temperature:
        < 64.5  -> yes
        < 66.5  -> no
        < 70.5  -> yes
        < 71.5  -> no
        < 77.5  -> yes
        < 80.5  -> no
        < 84.0  -> yes
        >= 84.0 -> no
(13/14 instances correct)


Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances          13               92.8571 %
Incorrectly Classified Instances         1                7.1429 %
Kappa statistic                          0.8372
Mean absolute error                      0.0714
Root mean squared error                  0.2673
Relative absolute error                 15.3846 %
Root relative squared error             55.7386 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 9 0 | a = yes
 1 4 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           5               35.7143 %
Incorrectly Classified Instances         9               64.2857 %
Kappa statistic                         -0.3404
Mean absolute error                      0.6429
Root mean squared error                  0.8018
Relative absolute error                135      %
Root relative squared error            162.5137 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 4 5 | a = yes
 4 1 | b = no


如下weather.numeric.arff 案例集的4個屬性，以temperature所建單節點決策樹錯誤率雖低，
只是遷就看過資料能力強，預測未見資料能力則弱。


 
 
 
 
 
 

  outlook
  temperature
  humidity
  windy
  play
 

  overcast
  64
  65
  TRUE
  yes
 

  rainy
  65
  70
  TRUE
  no
 

  rainy
  68
  80
  FALSE
  yes
 

  sunny
  69
  70
  FALSE
  yes
 

  rainy
  70
  96
  FALSE
  yes
 

  rainy
  71
  91
  TRUE
  no
 

  overcast
  72
  90
  TRUE
  yes
 

  sunny
  72
  95
  FALSE
  no
 

  rainy
  75
  80
  FALSE
  yes
 

  sunny
  75
  70
  TRUE
  yes
 

  sunny
  80
  90
  TRUE
  no
 

  overcast
  81
  75
  FALSE
  yes
 

  overcast
  83
  86
  FALSE
  yes
 

  sunny
  85
  85
  FALSE
  no
 


參考: weka.classifiers.rules.OneR
1. source code
2. documentation

weka.classifiers.rules.ZeroR

weka.classifiers.rules.ZeroR屬背景值(零規則)學習器，利用多數/平均決原理，提供案例集背景表現值供標竿比較之用。
任何學習器都應該比ZeroR表現(背景值)更好才有存在價值。

ZeroR學習分類時只記錄看過案例中多數類別為何。學習迴歸時只記錄看過案例的平均值為何。
預測時則完全不看案例屬性，任何案例的分類皆預測為記錄的多數類別，任何迴歸皆預測為記錄的平均值。

> java -cp weka.jar;. weka.classifiers.rules.ZeroR  -t data\weather.numeric.arff

ZeroR predicts class value: yes

Time taken to build model: 0 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances           9               64.2857 %
Incorrectly Classified Instances         5               35.7143 %
Kappa statistic                          0
Mean absolute error                      0.4643
Root mean squared error                  0.4795
Relative absolute error                100      %
Root relative squared error            100      %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 9 0 | a = yes
 5 0 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           9               64.2857 %
Incorrectly Classified Instances         5               35.7143 %
Kappa statistic                          0
Mean absolute error                      0.4762
Root mean squared error                  0.4934
Relative absolute error                100      %
Root relative squared error            100      %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 9 0 | a = yes
 5 0 | b = no

如下 weather.numeric.arff 案例集的14個案例有9個yes，5個no。

 
 
 
 
 
 

  outlook
  temperature
  humidity
  windy
  play
 

  sunny
  85
  85
  FALSE
  no
 

  sunny
  80
  90
  TRUE
  no
 

  rainy
  65
  70
  TRUE
  no
 

  sunny
  72
  95
  FALSE
  no
 

  rainy
  71
  91
  TRUE
  no
 

  overcast
  83
  86
  FALSE
  yes
 

  rainy
  70
  96
  FALSE
  yes
 

  rainy
  68
  80
  FALSE
  yes
 

  overcast
  64
  65
  TRUE
  yes
 

  sunny
  69
  70
  FALSE
  yes
 

  rainy
  75
  80
  FALSE
  yes
 

  sunny
  75
  70
  TRUE
  yes
 

  overcast
  72
  90
  TRUE
  yes
 

  overcast
  81
  75
  FALSE
  yes
 


參考: weka.classifiers.rules.ZeorR
1. source code
2. documentation

訂閱：意見 (Atom)

Seke Blog