Classification Error


Still another way to measure impurity degree is using index of classification error, which refers to the misclassification of examinees into the pass and fail categories when a passing score is applied. Classification errors can occur in both directions. That is, a truly competent examinee might fail the test, while an incompetent examinee might pass the test. A primary goal of well-designed exam programs is to minimize classification error.
    Classification Error = 1 – max{pj}
where pj is the probability of the class value j. For example, given that
    Prob( Bus )   = 4 / 10 = 0.4        # 4B / 10 rows
    Prob( Car )   = 3 / 10 = 0.3        # 3C / 10 rows
    Prob( Train ) = 3 / 10 = 0.3        # 3T / 10 rows
we can now compute Classification error as
     Classification error
   = 1 – Max{0.4, 0.3, 0.3}
   = 1 – 0.4 = 0.60
Similar to Entropy and Gini Index, Classification error index of a pure table (consist of single class) is zero because the probability is 1 and 1-max(1)=0. The value of classification error index is always between 0 and 1. In fact the maximum Gini index for a given number of classes is always equal to the maximum of classification error index because for a number of classes n, we set probability is equal to p=1/n and maximum Gini index happens at
     1 – n×(1/n)2 = 1 – 1/n
while maximum classification error index also happens at
     1 – max{1/n} = 1 – 1/n
Knowing how to compute degree of impurity, now we are ready to proceed with decision tree algorithms.




      “I think it’s impossible to really understand somebody,    
      what they want, what they believe,    
      and not love them the way they love themselves.”    
      ― Orson Scott Card, Ender’s Game