2015年10月13日 星期二

weka.classifiers.bayes.NaiveBayes

weka.classifiers.bayes.NaiveBayes 為簡單貝氏機率式學習器,
記錄各類別事前機率,及給定類別下各屬性值出現之條件機率,
再依案例,累乘得到給定屬性值下各類別出現之事後機率,取機率高者為預測類別,
可提供案例集不錯表現值供標竿比較之用。

NaiveBayes 學習分類時,為每個類別統計其類別事前機率(prior probability)、給定類別下各屬性值出現之條件機率。
遇數值屬性時,預設母體為常態分布,統計其平均值、標準差,供條件機率之推估。
預測分類時,依新案例,累乘得到給定屬性值下各類別出現之事後機率(posterior probability),取機率高者為預測類別。

列印模型時,屬性各類別的加權和(weight sum)在案例權重為1下,等於案例個數。
屬性精確度(precision)=屬性相鄰數值差之總和(deltaSum)/相異值(distinct)個數,
凡間隔小於精確度之值將視為同一個值,供估算母體分布參數之用。

參數說明:
-K 數值屬性不用常態分布,改用核心密度推估器(kernel density estimator)推算條件機率
-D 數值屬性利用監督式離散化(supervised disretization)方法,視為多個區間文字值
-O 模型改用舊格式顯示,適用類別眾多場合

> java -cp weka.jar;. weka.classifiers.bayes.NaiveBayes  -t data\weather.numeric.arff

Naive Bayes Classifier

                 Class
Attribute          yes      no
                (0.63)  (0.38)
===============================
outlook
  sunny             3.0     4.0
  overcast          5.0     1.0
  rainy             4.0     3.0
  [total]          12.0     8.0

temperature
  mean          72.9697 74.8364
  std. dev.      5.2304   7.384
  weight sum          9       5
  precision      1.9091  1.9091

humidity
  mean          78.8395 86.1111
  std. dev.      9.8023  9.2424
  weight sum          9       5
  precision      3.4444  3.4444

windy
  TRUE              4.0     4.0
  FALSE             7.0     3.0
  [total]          11.0     7.0


Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0.01 seconds

=== Error on training data ===

Correctly Classified Instances          13               92.8571 %
Incorrectly Classified Instances         1                7.1429 %
Kappa statistic                          0.8372
Mean absolute error                      0.2798
Root mean squared error                  0.3315
Relative absolute error                 60.2576 %
Root relative squared error             69.1352 %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 9 0 | a = yes
 1 4 | b = no



=== Stratified cross-validation ===

Correctly Classified Instances           9               64.2857 %
Incorrectly Classified Instances         5               35.7143 %
Kappa statistic                          0.1026
Mean absolute error                      0.4649
Root mean squared error                  0.543
Relative absolute error                 97.6254 %
Root relative squared error            110.051  %
Total Number of Instances               14


=== Confusion Matrix ===

 a b   <-- classified as
 8 1 | a = yes
 4 1 | b = no


如下 weather.numeric.arff 案例集的14個案例利用2個文字屬性及2個數字屬性,預測文字屬性。
outlook temperature humidity windy play
sunny 85 85 FALSE no
sunny 80 90 TRUE no
rainy 65 70 TRUE no
sunny 72 95 FALSE no
rainy 71 91 TRUE no
overcast 83 86 FALSE yes
rainy 70 96 FALSE yes
rainy 68 80 FALSE yes
overcast 64 65 TRUE yes
sunny 69 70 FALSE yes
rainy 75 80 FALSE yes
sunny 75 70 TRUE yes
overcast 72 90 TRUE yes
overcast 81 75 FALSE yes
參考: 1.weka.classifiers.bayes.NaiveBayes code | doc 2.weka.estimators.NormalEstimator code | doc 3.weka.estimators.KernelEstimator code | doc 4.weka.filters.supervised.attribute.Discretize code | doc

沒有留言: