Classification Error


Still another way to measure impurity degree is using index of classification error, which refers to the misclassification of examinees into the pass and fail categories when a passing score is applied. Classification errors can occur in both directions. That is, a truly competent examinee might fail the test, while an incompetent examinee might pass the test. A primary goal of well-designed exam programs is to minimize classification error.
    Classification Error = 1 – max{pj}
where pj is the probability of the class value j. For example, given that
    Prob( Bus )   = 4 / 10 = 0.4        # 4B / 10 rows
    Prob( Car )   = 3 / 10 = 0.3        # 3C / 10 rows
    Prob( Train ) = 3 / 10 = 0.3        # 3T / 10 rows
we can now compute Classification error as
     Classification error
   = 1 – Max{0.4, 0.3, 0.3}
   = 1 – 0.4 = 0.60
Similar to Entropy and Gini Index, Classification error index of a pure table (consist of single class) is zero because the probability is 1 and 1-max(1)=0. The value of classification error index is always between 0 and 1. In fact the maximum Gini index for a given number of classes is always equal to the maximum of classification error index because for a number of classes n, we set probability is equal to p=1/n and maximum Gini index happens at
     1 – n×(1/n)2 = 1 – 1/n
while maximum classification error index also happens at
     1 – max{1/n} = 1 – 1/n
Knowing how to compute degree of impurity, now we are ready to proceed with decision tree algorithms.