Here is how it works in plain English:
we have training set ( known features ( normalized), and classification) :
many data points: [ ( feature1,feature2,feature 3,…), ( f1,f2,f3 …), ….]
and corresponding labels/classification: [category1, 2, …]
for any new data point t
calculate the distance between this t to each of the training set data points
find/sort the K most near ( most similar) data points
–> take a majority vote from K ‘s label as the new label/class
The idea seems simple but it is quite powerful, one example is the handwriting recognition, e.g: handwriting for 1,2,… 9, with enough training sets, we can easily recognize some new handwriting!
From Wiki:
-
- In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
-
- In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.
References:
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
http://www.saedsayad.com/k_nearest_neighbors.htm