Two Weka Command Line Examples of Using Models in Training and Testing: (1) train and save an OneR model load and test an OneR model both using the weather.nominal.arff dataset (2) train and save a FilteredClassifier (StringToWordVector + J48) model load and test a FilteredClassifier (StringToWordVector + J48) model using the crude_oil_train.arff dataset for training and the crude_oil_test.arff dataset for testing #------------------------------- #ask for classifiers options >java -cp weka.jar weka.classifiers.rules.OneR -h -info Help requested. General options: -h or -help Output help information. -synopsis or -info Output synopsis for classifier (use in conjunction with -h) -t <name of training file> Sets training file. -T <name of test file> Sets test file. If missing, a cross-validation will be performed on the training data. -c <class index> Sets index of class attribute (default: last). -x <number of folds> Sets number of folds for cross-validation (default: 10). -no-cv Do not perform any cross validation. -split-percentage <percentage> Sets the percentage for the train/test set split, e.g., 66. -preserve-order Preserves the order in the percentage split. -s <random number seed> Sets random number seed for cross-validation or percentage split (default: 1). -m <name of file with cost matrix> Sets file with cost matrix. -l <name of input file> Sets model input file. In case the filename ends with '.xml', a PMML file is loaded or, if that fails, options are loaded from the XML file. -d <name of output file> Sets model output file. In case the filename ends with '.xml', only the options are saved to the XML file, not the model. -v Outputs no statistics for training data. -o Outputs statistics only, not the classifier. -i Outputs detailed information-retrieval statistics for each class. -k Outputs information-theoretic statistics. -p <attribute range> Only outputs predictions for test instances (or the train instances if no test instances provided and -no-cv is used), along with attributes (0 for none). -distribution Outputs the distribution instead of only the prediction in conjunction with the '-p' option (only nominal classes). -r Only outputs cumulative margin distribution. -z <class name> Only outputs the source representation of the classifier, giving it the supplied name. -xml filename | xml-string Retrieves the options from the XML-data instead of the command line. -threshold-file <file> The file to save the threshold data to. The format is determined by the extensions, e.g., '.arff' for ARFF format or '.csv' for CSV. -threshold-label <label> The class label to determine the threshold data for (default is the first label) Options specific to weka.classifiers.rules.OneR: -B <minimum bucket size> The minimum number of objects in a bucket (default: 6). Synopsis for weka.classifiers.rules.OneR: # synopsis is shown with -info option Class for building and using a 1R classifier; in other words, uses the minimum-error attribute for prediction, discretizing numeric attributes. For more information, see: R.C. Holte (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning. 11:63-91. #--------------------------------------------------------- # Example (1): use OneR to train and test on weather.nominal.arff #train classifier by train_data and output model without evaluation >java -cp weka.jar weka.classifiers.rules.OneR \ > -t data/weather.nominal.arff -no-cv -v -d model.dat outlook: sunny -> no overcast -> yes rainy -> yes (10/14 instances correct) === Error on training data === # this report not shown with -v option Correctly Classified Instances 10 71.4286 % ..... === Stratified cross-validation === # this report not shown with -no-cv option Correctly Classified Instances 6 42.8571 % ..... #load model and test classifier by test_data >java -cp weka.jar weka.classifiers.rules.OneR \ > -T data/weather.nominal.arff -l model.dat outlook: sunny -> no overcast -> yes rainy -> yes (10/14 instances correct) === Error on test data === Correctly Classified Instances 10 71.4286 % Incorrectly Classified Instances 4 28.5714 % Kappa statistic 0.3778 Mean absolute error 0.2857 Root mean squared error 0.5345 Total Number of Instances 14 === Confusion Matrix === a b <-- classified as 7 2 | a = yes 2 3 | b = no >java -cp weka.jar weka.classifiers.rules.OneR \ > -T data/weather.nominal.arff -l model.dat -p first-last === Predictions on test data === inst# actual predicted error prediction (outlook,temperature,humidity,windy) 1 2:no 2:no 1 (sunny,hot,high,FALSE) 2 2:no 2:no 1 (sunny,hot,high,TRUE) 3 1:yes 1:yes 1 (overcast,hot,high,FALSE) 4 1:yes 1:yes 1 (rainy,mild,high,FALSE) 5 1:yes 1:yes 1 (rainy,cool,normal,FALSE) 6 2:no 1:yes + 1 (rainy,cool,normal,TRUE) 7 1:yes 1:yes 1 (overcast,cool,normal,TRUE) 8 2:no 2:no 1 (sunny,mild,high,FALSE) 9 1:yes 2:no + 1 (sunny,cool,normal,FALSE) 10 1:yes 1:yes 1 (rainy,mild,normal,FALSE) 11 1:yes 2:no + 1 (sunny,mild,normal,TRUE) 12 1:yes 1:yes 1 (overcast,mild,high,TRUE) 13 1:yes 1:yes 1 (overcast,hot,normal,FALSE) 14 2:no 1:yes + 1 (rainy,mild,high,TRUE) #-------------------------------------------------------------------- #Example (2): use FilteredClassifier (StringToWordVector + J48) to # train on crude_oil_train.arff and test on crude_oil_test.arff #train classifier by train_data and output model without evaluation > java -cp weka.jar weka.classifiers.meta.FilteredClassifier \ > -no-cv -v -t data/crude_oil_train.arff -d model.dat \ > -F weka.filters.unsupervised.attribute.StringToWordVector \ > -W weka.classifiers.trees.J48 Options: -F weka.filters.unsupervised.attribute.StringToWordVector -W weka.classifiers.trees.J48 FilteredClassifier using weka.classifiers.trees.J48 -C 0.25 -M 2 on data filtered through weka.filters.unsupervised.attribute.StringToWordVector -R 1 -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer -M 1 -tokenizer "weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"" Filtered Header @relation 'crude_oil_train-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"' @attribute class {yes,no} @attribute Crude numeric @attribute Demand numeric @attribute The numeric @attribute crude numeric @attribute for numeric @attribute has numeric @attribute in numeric @attribute increased numeric @attribute is numeric @attribute of numeric @attribute oil numeric @attribute outstrips numeric @attribute price numeric @attribute short numeric @attribute significantly numeric @attribute supply numeric @attribute Some numeric @attribute Use numeric @attribute a numeric @attribute bit numeric @attribute cooking numeric @attribute do numeric @attribute flavor numeric @attribute food numeric @attribute frying numeric @attribute like numeric @attribute not numeric @attribute oily numeric @attribute olive numeric @attribute pan numeric @attribute people numeric @attribute the numeric @attribute very numeric @attribute was numeric @data Classifier Model J48 pruned tree ------------------ crude <= 0: no (4.0/1.0) crude > 0: yes (2.0) Number of Leaves : 2 Size of the tree : 3 #load model and test classifier by test_data > java -cp weka.jar weka.classifiers.meta.FilteredClassifier \ > -T data/crude_oil_test.arff -l model.dat -p first-last === Predictions on test data === inst# actual predicted error prediction (document) 1 1:yes 1:yes 1 ('Oil platforms extract crude oil') 2 2:no 2:no 0.75 ('Canola oil is supposed to be healthy') 3 1:yes 2:no + 0.75 ('Iraq has significant oil reserves') 4 2:no 2:no 0.75 ('There are different types of cooking oil') > java -cp weka.jar weka.classifiers.meta.FilteredClassifier \ > -T data/crude_oil_test2.arff -l model.dat -p first-last === Predictions on test data === inst# actual predicted error prediction (document) 1 1:? 1:yes 1 ('Oil platforms extract crude oil') 2 1:? 2:no 0.75 ('Canola oil is supposed to be healthy') 3 1:? 2:no 0.75 ('Iraq has significant oil reserves') 4 1:? 2:no 0.75 ('There are different types of cooking oil') ######### data/weather.nominal.arff @relation weather.symbolic @attribute outlook {sunny, overcast, rainy} @attribute temperature {hot, mild, cool} @attribute humidity {high, normal} @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,hot,high,FALSE,no sunny,hot,high,TRUE,no overcast,hot,high,FALSE,yes rainy,mild,high,FALSE,yes rainy,cool,normal,FALSE,yes rainy,cool,normal,TRUE,no overcast,cool,normal,TRUE,yes sunny,mild,high,FALSE,no sunny,cool,normal,FALSE,yes rainy,mild,normal,FALSE,yes sunny,mild,normal,TRUE,yes overcast,mild,high,TRUE,yes overcast,hot,normal,FALSE,yes rainy,mild,high,TRUE,no ######### data/crude_oil_train.arff % % witten-12-mkp-data mining- practical machine learning tools and techniques % ch17 tutorial exercises for the weka explorer % ch17.5 document classification % % @relation 'crude_oil_train' % @attribute document string @attribute class {yes,no} % @data 'The price of crude oil has increased significantly',yes 'Demand for crude oil outstrips supply',yes 'Some people do not like the flavor of olive oil',no 'The food was very oily',no 'Crude oil is in short supply',yes 'Use a bit of cooking oil in the frying pan',no ######### data/crude_oil_test.arff % % witten-12-mkp-data mining- practical machine learning tools and techniques % ch17 tutorial exercises for the weka explorer % ch17.5 document classification % % @relation 'crude_oil_test' % @attribute document string @attribute class {yes,no} % @data 'Oil platforms extract crude oil',yes 'Canola oil is supposed to be healthy',no 'Iraq has significant oil reserves',yes 'There are different types of cooking oil',no ######### data/crude_oil_test2.arff % % witten-12-mkp-data mining- practical machine learning tools and techniques % ch17 tutorial exercises for the weka explorer % ch17.5 document classification % % @relation 'crude_oil_test' % @attribute document string @attribute class {yes,no} % @data 'Oil platforms extract crude oil',? 'Canola oil is supposed to be healthy',? 'Iraq has significant oil reserves',? 'There are different types of cooking oil',?
2015年12月28日 星期一
How to Save and Load a Model in Weka for Training and Testing
訂閱:
張貼留言 (Atom)
沒有留言:
張貼留言