Monday, April 15, 2013

C4.5 Tutorial 3: Using C4.5 Commands

This tutorial shows how to use the basic C4.5 commands

1- In order to setup C4.5 to work from inside the command line please review this post.

2- Produce a decision tree:
  • c4.5 -f "names of the data set" : this command produce one decision tree and the results are saved to file unpruned. in order, to save the decision tree into a file use the command " > name-of-the-file.txt "
3- Produce multiple decision trees:
  • c4.5 -f  whas1 -t"n" > dt.txt : This command produce "n" decision trees, and the window size is by default 20 % of the size of the data set. The output is saved to file dt.txt.
  • e.g: c4.5 -f  whas1 -t10 > dt.txt
4- Change the window size
  • c4.5 -f whas1 -t10 -w"m" > dtw.txt : This command produces 10 decision trees using the windowing technique, the initial size of the window is m% of the size of the data set. The output is saved to dtw.txt
  • e.g: c4.5 -f whas1 -t10 -w50 > dtw.txt
5- Generating Rule Sets
  • c4.5rules -f whas1 > dtr.txt : This command converts the trees produced above to rule sets. The output is saved to dtr.txt

Below you can find a step-by-step video tutorial.


Saturday, April 6, 2013

C4.5 Tutorial 2 - Data Format - Create the names and data files

This tutorial shows you how to setup a C4.5 experiment.

1- First, we need a data set to work with. For the purpose of this tutorial, I will be using the "whas1" data set.
Each data set came along with two files the description file and the data file.

The description file contains the data set name, number of instances, number of attributes, values of attributes as well as the data set references. And, the data file contains the data of the data set.

2- Download these files

Each experiment needs 3 files; a names file, a data file (training set), and an optional test file (testing set) depending on the type of research.

3- Creating the names file.

  • Open the description file and copy the attributes and their values to a new text file.
  • The first line of the names file should contain the class labels. The class labels are the values of the last attribute. In our case, it is the "FSTAT" attribute and its values are 0:Alive and 1:Dead.
  • The format of the first line should be: 0, 1.
  • Next, start listing the attributes and their values each one on a separate line and end the lines with a period. e.g: AGE:  continuous. N.B: Do not list the last attribute that indicates the class labels in your attribute list.
  • Finally, save the file as "datasetname.names". In our case, it is "whas1.names".
  • For more information on how to specify the value of each attribute watch the below video.


4- Creating the data file

  • Download the data file.
  • Delete the first row that contains the list of attributes.
  • Delete any unnecessary columns.
  • Delete any row that contains missing data
  • Save the file as "datasetname.data", In our case, it is "whas1.data".
  • N.B: In case you want to have a testing set, you should divide the data file into two portion according to a given percentage. Then, save one file as "datasetname.data" and the second as "datasetname.test".
  • For more information on how to create the data and test files watch the below video.