Chinese Yellow Pages | Classifieds | Knowledge | Tax | IME

Here is how it works in plain English:

we have training set ( known features ( normalized), and classification) :

many data points: [ ( feature1,feature2,feature 3,…), ( f1,f2,f3 …), ….]

and corresponding labels/classification: [category1, 2, …]

for any new data point  t

calculate the distance between this  t to each of the training set data points

find/sort the K most near ( most similar) data points

–> take a majority vote from K ‘s label  as the new label/class

 

The idea seems simple but it is quite powerful, one example is the handwriting recognition, e.g:  handwriting for 1,2,… 9, with enough training sets, we can easily recognize some new handwriting!

 

From Wiki:

  • In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
  • In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.

 

 

References:

https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

http://www.saedsayad.com/k_nearest_neighbors.htm

Leave a Reply

Your email address will not be published. Required fields are marked *