Tuesday, May 21, 2013

C4.5 Tutorial 5: 10 Fold Cross Validation

This tutorial shows how to perform the 10-fold cross validation

1- Place the modified xval.sh file (found below) along with the data, test and names files inside one folder.

2- Open a new terminal and go to the folder directory

3- Type " csh xval-new.sh 'name of the data set' 10 ". This command generated 20 folds; 10 data folds (e.g. XDF1.data) and 10 test folds (e.g: XDF1.test).

N.B: If you don't have the "csh" package installed, install it by typing " sudo apt-get install csh".

Below you can find the modified version of the xval.sh file.

#csh

#---------------------------------------------------------------------
# N-way cross-validation script
#---------------------------------------------------------------------
#
# invocation:
#   csh xval.sh filestem N [options for c4.5 and c4.5rules] [suffix]
#
# individual results from each block are left in
#     filestem.[rt]o*[suffix],
# averages over all blocks in
#     filestem.[rt]res[suffix]
#---------------------------------------------------------------------

# sort the options into result suffix and control options for the programs
# Note: for options with values, there must be no space between the option
# name and value; e.g. "-v1", not "-v 1"

set treeopts =
set ruleopts =
set suffix =

foreach i ( $argv[3-] )
  switch ( $i )
  case "+*":
    set suffix = $i
    breaksw
  case "-v*":
  case "-c*":
    set treeopts = ($treeopts $i)
    set ruleopts = ($ruleopts $i)
    breaksw
  case "-p":
  case "-t*":
  case "-w*":
  case "-i*":
  case "-g":
  case "-s":
  case "-m*":
    set treeopts = ($treeopts $i)
    breaksw
  case "-r*":
  case "-F*":
  case "-a":
    set ruleopts = ($ruleopts $i)
    breaksw
  default:
    echo "unrecognised or inappropriate option" $i
    exit
  endsw
end

# prepare the data for cross-validation

cat $1.data $1.test | xval-prep $2 >XDF.data
cp /dev/null XDF.test
ln $1.names XDF.names
rm $1.[rt]o[0-9]*$suffix
set junk = `wc XDF.data`
set examples = $junk[1]
set large = `expr $examples % $2`
set segsize = `expr \( $examples / $2 \) + 1`

# perform the cross-validation trials

set i = 0
while ( $i < $2 )
  if ( $i == $large ) set segsize = `expr $examples / $2`
  cat XDF.test XDF.data | split -`expr $examples - $segsize`
  mv xaa XDF.data
  mv xab XDF.test
  #modified by saada
  cp XDF.data XDF$i.data
  cp XDF.test XDF$i.test
  #end modification
  c4.5 -f XDF -u $treeopts >$1.to$i$suffix
  c4.5rules -f XDF -u $ruleopts >$1.ro$i$suffix

  @ i++
end

# remove the temporary files and summarize results
#rm -f XDF.*
#end modification
cat $1.to[0-9]*$suffix | grep "<<" | average >$1.tres$suffix
cat $1.ro[0-9]*$suffix | grep "<<" | average >$1.rres$suffix



Monday, April 15, 2013

C4.5 Tutorial 3: Using C4.5 Commands

This tutorial shows how to use the basic C4.5 commands

1- In order to setup C4.5 to work from inside the command line please review this post.

2- Produce a decision tree:
  • c4.5 -f "names of the data set" : this command produce one decision tree and the results are saved to file unpruned. in order, to save the decision tree into a file use the command " > name-of-the-file.txt "
3- Produce multiple decision trees:
  • c4.5 -f  whas1 -t"n" > dt.txt : This command produce "n" decision trees, and the window size is by default 20 % of the size of the data set. The output is saved to file dt.txt.
  • e.g: c4.5 -f  whas1 -t10 > dt.txt
4- Change the window size
  • c4.5 -f whas1 -t10 -w"m" > dtw.txt : This command produces 10 decision trees using the windowing technique, the initial size of the window is m% of the size of the data set. The output is saved to dtw.txt
  • e.g: c4.5 -f whas1 -t10 -w50 > dtw.txt
5- Generating Rule Sets
  • c4.5rules -f whas1 > dtr.txt : This command converts the trees produced above to rule sets. The output is saved to dtr.txt

Below you can find a step-by-step video tutorial.


Saturday, April 6, 2013

C4.5 Tutorial 2 - Data Format - Create the names and data files

This tutorial shows you how to setup a C4.5 experiment.

1- First, we need a data set to work with. For the purpose of this tutorial, I will be using the "whas1" data set.
Each data set came along with two files the description file and the data file.

The description file contains the data set name, number of instances, number of attributes, values of attributes as well as the data set references. And, the data file contains the data of the data set.

2- Download these files

Each experiment needs 3 files; a names file, a data file (training set), and an optional test file (testing set) depending on the type of research.

3- Creating the names file.

  • Open the description file and copy the attributes and their values to a new text file.
  • The first line of the names file should contain the class labels. The class labels are the values of the last attribute. In our case, it is the "FSTAT" attribute and its values are 0:Alive and 1:Dead.
  • The format of the first line should be: 0, 1.
  • Next, start listing the attributes and their values each one on a separate line and end the lines with a period. e.g: AGE:  continuous. N.B: Do not list the last attribute that indicates the class labels in your attribute list.
  • Finally, save the file as "datasetname.names". In our case, it is "whas1.names".
  • For more information on how to specify the value of each attribute watch the below video.


4- Creating the data file

  • Download the data file.
  • Delete the first row that contains the list of attributes.
  • Delete any unnecessary columns.
  • Delete any row that contains missing data
  • Save the file as "datasetname.data", In our case, it is "whas1.data".
  • N.B: In case you want to have a testing set, you should divide the data file into two portion according to a given percentage. Then, save one file as "datasetname.data" and the second as "datasetname.test".
  • For more information on how to create the data and test files watch the below video.



Wednesday, March 27, 2013

C4.5 Tutorial 1: Setup C4.5

This Tutorial will show you how to setup C4.5.

1- Download C4.5 source code.


2-Once you have downloaded C4.5 source code, open a new terminal and go to your C4.5 directory.







3- Decompress the C4.5 folder by typing "tar xvf c4.5r8.tar.gz"







4- Once the folder have been decompressed, you can see in your c4.5 directory a new folder called R8.






5- Change your directory to R8/Src ; "cd R8/Src"








6- Compile the executable inside the Src folder by typing in your command line "make all"







7- Go to your Src folder and copy the executable to a new folder and call it bin. The executable files are: c4.5, c4.5rules, consult, consultr, xval-prep.






8- Your are not ready yet.








In order to use C4.5 commands from the command line you should add the bin folder path to your environmental variables. For that, go back to the terminal and type "gedit ~/.bashrc"





9- At the end of the "bashrc" file after the "fi" statement write on a new line the following statement

export PATH="$HOME/bin:$PATH:<YOUR bin FILE PATH>"






10- Restart your terminal, and now your are ready to use C4.5 command from inside the command line.








Here a step-by-step video tutorial