2009年10月29日 星期四

simple notes on libsvm java

libsvm java版使用例
==================
                                             
1.libsvm程式下載點:
  http://www.csie.ntu.edu.tw/~cjlin/libsvm+zip

2.範例資料下載點:
  http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets

3.程式編譯法:
C:\libsvm\libsvm-2.89\java>javac -cp libsvm.jar *.java
產生尺度調整器svm_scale.class
    訓練器svm_train.class
    預測器svm_predict.class

4.資料採稀疏格式,一列一案例,先預測數值,再列舉所有非零(維度:值)

5.訓練測試範例
  以下範例訓練集資料: a1a.txt
  以下範例測試集資料: a1a_t.txt

A.未作尺度調整例:
A1.呼叫訓練器
    輸入訓練資料: a1a.txt
    輸出學得模型: a1a_model.txt
C:\libsvm\libsvm-2.89\java>java -cp libsvm.jar svm_train a1a.txt a1a_model.txt
*
optimization finished, #iter = 495
nu = 0.46026768501985826
obj = -673.0313934890871, rho = -0.6285688589260043
nSV = 754, nBSV = 722
Total nSV = 754

A2.呼叫預測器
    輸入測試資料: a1a_t.txt
    輸入學得模型: a1a_model.txt
    輸出預測結果: a1a_predict.txt
C:\libsvm\libsvm-2.89\java>java -cp libsvm.jar svm_predict a1a_t.txt a1a_model.txt a1a_predict.txt
Accuracy = 83.58638066933712% (25875/30956) (classification)
--
B.作尺度調整例:
B1.呼叫尺度調整器2次
     輸入訓練資料:     a1a.txt
     輸出調整模型:     a1a_param.txt
     輸出調整訓練資料: a1a_scale.txt
C:\libsvm\libsvm-2.89\java>java -cp libsvm.jar svm_scale -s a1a_param.txt a1a.txt > a1a_scale.txt
Warning: original #nonzeros 22249
         new      #nonzeros 181365
Use -l 0 if many original feature values are zeros

     輸入測試資料:     a1a_t.txt
     輸入調整模型:     a1a_param.txt
     輸出調整測試資料: a1a_t_scale.txt
C:\libsvm\libsvm-2.89\java>java -cp libsvm.jar svm_scale -r a1a_param.txt a1a_t.txt > a1a_t_scale.txt
Warning: original #nonzeros 429343
         new      #nonzeros 3807588
Use -l 0 if many original feature values are zeros

B2.呼叫訓練器
    輸入訓練資料: a1a_scale.txt
    輸出學得模型: a1a_model_scale.txt
C:\libsvm\libsvm-2.89\java>java -cp libsvm.jar svm_train a1a_scale.txt
 a1a_model_scale.txt
*
optimization finished, #iter = 682
nu = 0.4077289259698594
obj = -593.6459193183854, rho = -0.48104500731367267
nSV = 694, nBSV = 622
Total nSV = 694

B3.呼叫預測器
    輸入測試資料: a1a_t_scale.txt
    輸入學得模型: a1a_model_scale.txt
    輸出預測結果: a1a_predict_scale.txt
C:\libsvm\libsvm-2.89\java>java -cp libsvm.jar svm_predict a1a_t_scale
.txt a1a_model_scale.txt a1a_predict_scale.txt
Accuracy = 84.05478744023776% (26020/30956) (classification)

--
-- 以上B相對於A,多作了輸出入範圍尺度調整,準確率略提昇.
-- 另外,A1及B2呼叫訓練器時,針對預設C-SVC學習器,
-- 若能適當作參數網格搜尋最佳化,準確率會更提昇
-- C-SVC參數有-g gamma及-c cost兩項,詳libsvm首頁guide.pdf
--  http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
--
1. 尺度調整器選項:
C:\libsvm\libsvm-2.89\java>java -cp libsvm.jar svm_scale
Usage: svm-scale [options] data_filename
options:
-l lower : x scaling lower limit (default -1)
-u upper : x scaling upper limit (default +1)
-y y_lower y_upper : y scaling limits (default: no y scaling)
-s save_filename : save scaling parameters to save_filename
-r restore_filename : restore scaling parameters from restore_filename

2. 訓練器選項:
C:\libsvm\libsvm-2.89\java>java -cp libsvm.jar svm_train
Usage: svm_train [options] training_set_file [model_file]
options:
-s svm_type : set type of SVM (default 0)
        0 -- C-SVC
        1 -- nu-SVC
        2 -- one-class SVM
        3 -- epsilon-SVR
        4 -- nu-SVR
-t kernel_type : set type of kernel function (default 2)
        0 -- linear: u'*v
        1 -- polynomial: (gamma*u'*v + coef0)^degree
        2 -- radial basis function: exp(-gamma*|u-v|^2)
        3 -- sigmoid: tanh(gamma*u'*v + coef0)
        4 -- precomputed kernel (kernel values in training_set_file)
-d degree : set degree in kernel function (default 3)
-g gamma : set gamma in kernel function (default 1/k)
-r coef0 : set coef0 in kernel function (default 0)
-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)
-m cachesize : set cache memory size in MB (default 100)
-e epsilon : set tolerance of termination criterion (default 0.001)
-h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1)
-b probability_estimates : whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
-wi weight : set the parameter C of class i to weight*C, for C-SVC (default 1)
-v n : n-fold cross validation mode
-q : quiet mode (no outputs)

3. 預測器選項:
C:\libsvm\libsvm-2.89\java>java -cp libsvm.jar svm_predict
usage: svm_predict [options] test_file model_file output_file
options:
-b probability_estimates: whether to predict probability estimates, 0 or 1 (default 0); one-class SVM not supported yet