How kNN Algorithm Works?


K-nearest neighbor algorithm works based on minimum distance from the query instance to the training samples to determine the K-nearest neighbors. After we gather K nearest neighbors, we take simple majority of these K-nearest neighbors to be the prediction of the query instance.

The data for kNN algorithm consist of several multivariate attributes name Xi that will be used to classify Y. The data of kNN can be any measurement scale from ordinal, nominal, to quantitative scale but for the moment let us deal with only quantitative Xi and binary (nominal) Y. The last row is the query instance that we want to predict. The graph of this problem is shown below. Suppose we determine K=8 (we will use 8 nearest neighbors) as parameter of this algorithm. Because we use only quantitative Xi , we can use Euclidean distance. Suppose the query instance have coordinates (x1q, x2q) and the coordinate of training sample is (x1t, x2t), then square Euclidean distance is dtq2 = (x1t-x1q)2+(x2t-x2q)2. If you have more than 2 variables, you can use Euclidean distance formula:

    dij = (Σk(xik-xjk)2)½
For example, distance between objects A=(1,1) and B=(1.5,1.5) is computed as
    dAB = ((1-1.5)2+(1-1.5)2)½ = 0.5½ = 0.7071

Another example of distance between objects D=(3,4) and F=(3,3.5) is computed as


      I’ll end up going bananas (irrational or crazy)    
      if I have to work in this cubicle for one more day!