Slide 14.12: A K-means clustering application using scikit-learn

Slide 14.11: Strengths and weaknesses of kNN
Slide 15.1: Artificial intelligence (AI)
Home Print version

A kNN Application Using scikit-learn

kNN is a simple, supervised machine learning (ML) algorithm that can be used for classification or regression tasks—and is also frequently used in missing value imputation. It is based on the idea that the observations closest to a given data point are the most “similar” observations in a data set, and we can therefore classify unforeseen points based on the values of the closest existing points. By choosing k, the user can select the number of nearby observations to use in the algorithm. This slide will show you how to implement the kNN algorithm for classification, and show how different values of k affect the results.

A kNN Application Using scikit-learn
k is the number of nearest neighbors to use. For classification, a majority vote is used to determined which class a new observation should fall into. Larger values of k are often more robust to outliers and produce more stable decision boundaries than very small values (k=3 would be better than k=1, which might produce undesirable results). Below is a kNN application using scikit-learn:

A kNN Application Using scikit-learn

All data points

A new data point classified

(before clicking, uncommenting 1 command, scatter, below)

(before clicking, uncommenting 2 commands, scatter and text, below)

(after clicking either one of the above two buttons)

# Three lines to make Python compiler able to draw:
import sys
import matplotlib
matplotlib.use( 'Agg' )

import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier

# a training set
x       = [  4,  5, 10,  4,  3, 11, 14,  8, 10, 12 ]
y       = [ 21, 19, 24, 17, 16, 25, 24, 22, 21, 21 ]
classes = [  0,  0,  1,  0,  0,  1,  1,  0,  1,  1 ]

data    = list( zip( x, y ) )

# k = 5
knn = KNeighborsClassifier( n_neighbors=5 )
# Fit a kNN model on the model using 5 nearest neighbors.
knn.fit( data, classes )

# a new point
new_x = 8
new_y = 21
new_point = [( new_x, new_y )]
# Predict the class of new, unforeseen data points.
prediction = knn.predict( new_point )

#
# Plot all points: Uncomment the command, scatter, below:
#

#plt.scatter( x, y, c=classes )

#
# Classify a new point: Uncomment the 2 commands, scatter & text, below:
#

#plt.scatter( x + [new_x], y + [new_y], c=classes + [prediction[0]] )
#plt.text( x=new_x-1.7, y=new_y-0.7, s=f"new point, class: {prediction[0]}" )

plt.show( )

# Two lines to make Python compiler able to draw:
plt.savefig( sys.stdout.buffer )
sys.stdout.flush( )

◀
Previous

Slide 14.11: Strengths and weaknesses of kNN
Slide 15.1: Artificial intelligence (AI)
Home Print version

▶
Next

Be careful when you follow the masses.
Sometimes the M is silent.