A Decision Tree Application Using scikit-learn


A decision tree is a flow chart, and can help you make decisions based on previous experience. In the example, a person will try to decide if he/she should go to a comedy show or not based on the data saved in data.csv. Now, based on this data set, Python can create a decision tree that can be used to decide if any new shows are worth attending to:
  1. Read the dataset with pandas.
  2. To make a decision tree, all data has to be numerical. We have to convert the non numerical columns “Nationality” and “Go” into numerical values. Pandas has a map( ) method that takes a dictionary with information on how to convert the values. For example,
       { 'UK': 0, 'USA': 1, 'N': 2 }
    which converts the values UK to 0, USA to 1, and N to 2.
  3. Separate the feature columns from the target column. The feature columns are the columns that we try to predict from, and the target column is the column with the values we try to predict.
  4. Create the actual decision tree, fit it with our details.
  5. Use the decision tree to predict new values. For example: Should I go see a show starring a 40 years old American comedian, with 10 years of experience, and a comedy ranking of 6?
Below is the Python source code for the decision tree method. The decision tree gives you different results if you run it enough times, even if you feed it with the same data. It is because the decision tree does not give us a 100% certain answer. It is based on the probability of an outcome, and the answer will vary. Below is a decision tree application using scikit-learn:

A Decision Tree Application Using scikit-learn
Training data (data.csv)
The decision tree
The decision
   

    (before clicking, uncommenting 3 commands, plot_tree, savefig, & flush, below)

    (before clicking, uncommenting 3 print commands below)

    (after clicking any one of the above three buttons)