weka.classifiers.bayes.NaiveBayes 為簡單貝氏機率式學習器,
記錄各類別事前機率,及給定類別下各屬性值出現之條件機率,
再依案例,累乘得到給定屬性值下各類別出現之事後機率,取機率高者為預測類別,
可提供案例集不錯表現值供標竿比較之用。
NaiveBayes 學習分類時,為每個類別統計其類別事前機率(prior probability)、給定類別下各屬性值出現之條件機率。
遇數值屬性時,預設母體為常態分布,統計其平均值、標準差,供條件機率之推估。
預測分類時,依新案例,累乘得到給定屬性值下各類別出現之事後機率(posterior probability),取機率高者為預測類別。
列印模型時,屬性各類別的加權和(weight sum)在案例權重為1下,等於案例個數。
屬性精確度(precision)=屬性相鄰數值差之總和(deltaSum)/相異值(distinct)個數,
凡間隔小於精確度之值將視為同一個值,供估算母體分布參數之用。
參數說明:
-K 數值屬性不用常態分布,改用核心密度推估器(kernel density estimator)推算條件機率
-D 數值屬性利用監督式離散化(supervised disretization)方法,視為多個區間文字值
-O 模型改用舊格式顯示,適用類別眾多場合
> java -cp weka.jar;. weka.classifiers.bayes.NaiveBayes -t data\weather.numeric.arff
Naive Bayes Classifier
Class
Attribute yes no
(0.63) (0.38)
===============================
outlook
sunny 3.0 4.0
overcast 5.0 1.0
rainy 4.0 3.0
[total] 12.0 8.0
temperature
mean 72.9697 74.8364
std. dev. 5.2304 7.384
weight sum 9 5
precision 1.9091 1.9091
humidity
mean 78.8395 86.1111
std. dev. 9.8023 9.2424
weight sum 9 5
precision 3.4444 3.4444
windy
TRUE 4.0 4.0
FALSE 7.0 3.0
[total] 11.0 7.0
Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0.01 seconds
=== Error on training data ===
Correctly Classified Instances 13 92.8571 %
Incorrectly Classified Instances 1 7.1429 %
Kappa statistic 0.8372
Mean absolute error 0.2798
Root mean squared error 0.3315
Relative absolute error 60.2576 %
Root relative squared error 69.1352 %
Total Number of Instances 14
=== Confusion Matrix ===
a b <-- classified as
9 0 | a = yes
1 4 | b = no
=== Stratified cross-validation ===
Correctly Classified Instances 9 64.2857 %
Incorrectly Classified Instances 5 35.7143 %
Kappa statistic 0.1026
Mean absolute error 0.4649
Root mean squared error 0.543
Relative absolute error 97.6254 %
Root relative squared error 110.051 %
Total Number of Instances 14
=== Confusion Matrix ===
a b <-- classified as
8 1 | a = yes
4 1 | b = no
如下 weather.numeric.arff 案例集的14個案例利用2個文字屬性及2個數字屬性,預測文字屬性。
outlook |
temperature |
humidity |
windy |
play |
sunny |
85 |
85 |
FALSE |
no |
sunny |
80 |
90 |
TRUE |
no |
rainy |
65 |
70 |
TRUE |
no |
sunny |
72 |
95 |
FALSE |
no |
rainy |
71 |
91 |
TRUE |
no |
overcast |
83 |
86 |
FALSE |
yes |
rainy |
70 |
96 |
FALSE |
yes |
rainy |
68 |
80 |
FALSE |
yes |
overcast |
64 |
65 |
TRUE |
yes |
sunny |
69 |
70 |
FALSE |
yes |
rainy |
75 |
80 |
FALSE |
yes |
sunny |
75 |
70 |
TRUE |
yes |
overcast |
72 |
90 |
TRUE |
yes |
overcast |
81 |
75 |
FALSE |
yes |
參考:
1.weka.classifiers.bayes.NaiveBayes
code |
doc
2.weka.estimators.NormalEstimator
code |
doc
3.weka.estimators.KernelEstimator
code |
doc
4.weka.filters.supervised.attribute.Discretize
code |
doc
沒有留言:
張貼留言