Slide 12.12: How to generate a decision tree (cont.)

Slide 12.11: How to generate a decision tree
Slide 12.13: How to generate a decision tree (cont.)
Home Print version

How to Generate a Decision Tree (Cont.)

From table D and for each associated subset S_i, we compute degree of impurity. To compute the degree of impurity, we must distinguish whether it is come from the parent table D or it come from a subset table S_i with attribute i. If the table is a parent table D, we simply compute the number of records of each class. For example, in the parent table below, we can compute degree of impurity based on transportation mode. In this case we have 4 Busses, 3 Cars and 3 Trains (in short 4B, 3C, 3T):

Attributes				Classes
Gender	Car Ownership	Travel Cost ($)/km	Income Level	Transportation Mode
Male	0	Cheap	Low	Bus
Male	1	Cheap	Medium	Bus
Female	1	Cheap	Medium	Train
Female	0	Cheap	Low	Bus
Male	1	Cheap	Medium	Bus
Male	0	Standard	Medium	Train
Female	1	Standard	Medium	Train
Female	1	Expensive	High	Car
Male	2	Expensive	Medium	Car
Female	2	Expensive	High	Car

Based on these data, we can compute probability of each class and the degrees of impurity:

    Prob( Bus )   = 4 / 10 = 0.4        # 4B / 10 rows
    Prob( Car )   = 3 / 10 = 0.3        # 3C / 10 rows
    Prob( Train ) = 3 / 10 = 0.3        # 3T / 10 rows

     Entropy
   = –0.4×log(0.4) – 0.3×log(0.3) – 0.3×log(0.3) = 1.571

     Gini index
   = 1 – (0.4² + 0.3² + 0.3²) = 0.660
 
     Classification error
   = 1 – Max{0.4, 0.3, 0.3} = 1 – 0.4 = 0.60

◀
Previous

Slide 12.11: How to generate a decision tree
Slide 12.13: How to generate a decision tree (cont.)
Home Print version

▶
Next

She is no spring chicken (young), but she is still very good looking.