Saturday, April 6, 2013

C4.5 Tutorial 2 - Data Format - Create the names and data files

This tutorial shows you how to setup a C4.5 experiment.

1- First, we need a data set to work with. For the purpose of this tutorial, I will be using the "whas1" data set.
Each data set came along with two files the description file and the data file.

The description file contains the data set name, number of instances, number of attributes, values of attributes as well as the data set references. And, the data file contains the data of the data set.

2- Download these files

Each experiment needs 3 files; a names file, a data file (training set), and an optional test file (testing set) depending on the type of research.

3- Creating the names file.

  • Open the description file and copy the attributes and their values to a new text file.
  • The first line of the names file should contain the class labels. The class labels are the values of the last attribute. In our case, it is the "FSTAT" attribute and its values are 0:Alive and 1:Dead.
  • The format of the first line should be: 0, 1.
  • Next, start listing the attributes and their values each one on a separate line and end the lines with a period. e.g: AGE:  continuous. N.B: Do not list the last attribute that indicates the class labels in your attribute list.
  • Finally, save the file as "datasetname.names". In our case, it is "whas1.names".
  • For more information on how to specify the value of each attribute watch the below video.


4- Creating the data file

  • Download the data file.
  • Delete the first row that contains the list of attributes.
  • Delete any unnecessary columns.
  • Delete any row that contains missing data
  • Save the file as "datasetname.data", In our case, it is "whas1.data".
  • N.B: In case you want to have a testing set, you should divide the data file into two portion according to a given percentage. Then, save one file as "datasetname.data" and the second as "datasetname.test".
  • For more information on how to create the data and test files watch the below video.



No comments:

Post a Comment