kNN is a simple, supervised machine learning (ML) algorithm that can be used for classification or regression tasks—and is also frequently used in missing value imputation.
It is based on the idea that the observations closest to a given data point are the most “similar” observations in a data set, and we can therefore classify unforeseen points based on the values of the closest existing points.
By choosing k, the user can select the number of nearby observations to use in the algorithm.
This slide will show you how to implement the kNN algorithm for classification, and show how different values of k affect the results.
A kNN Application Using scikit-learn k is the number of nearest neighbors to use.
For classification, a majority vote is used to determined which class a new observation should fall into.
Larger values of k are often more robust to outliers and produce more stable decision boundaries than very small values (k=3 would be better than k=1, which might produce undesirable results).
Below is a kNN application using scikit-learn: