Slide 13.4: How K-nearest neighbor (kNN) algorithm works? (cont.)

◀
Previous

Slide 13.3: How K-nearest neighbor (kNN) algorithm works?
Slide 13.5: A kNN numerical example (hand computation)
Home Print version

▶
Next

How kNN Algorithm Works? (Cont.)

If the X contains some categorical data or nominal or ordinal measurement scale, you may need to compute weighted distance from the multivariate variables. The next step is to find the K-nearest neighbors. We include a training sample as nearest neighbors if the distance of this training sample to the query instance is less than or equal to the K^th smallest distance. In other words, we sort the distance of all training samples to the query instance and determine the K^th minimum distance. If the distance of the training sample is below the K-th minimum, then we gather the category Y of this nearest neighbors’ training samples. In MS excel, we can use MS Excel function SMALL(array,K) to determine the K^th minimum value among the array. Some special case happens in our example that the 4^th until the 7^th minimum distance happen to be the same.

The kNN prediction of the query instance is based on simple majority of the category of nearest neighbors. In our example, the category is only binary, thus the majority can be taken as simple as counting the number of ‘+’ and ‘-’ signs. If the number of plus is greater than minus, we predict the query instance as plus and vice versa. If the number of plus is equal to minus, we can choose arbitrary or determine as one of the plus or minus. If your training samples contain Y as categorical data, take simple majority among this data. If the Y is quantitative, take average or any central tendency or mean value such as median or geometric mean.

◀
Previous

Slide 13.3: How K-nearest neighbor (kNN) algorithm works?
Slide 13.5: A kNN numerical example (hand computation)
Home Print version

▶
Next

“A great many people think they are thinking when
they are merely rearranging their prejudices.”
― William James