Monday, 6 July 2020

K-Means Clustering


K-Means clustering is an unsupervised learning technique. This technique is used to group the data points that are showing similar characteristics and dissimilar from others

In K-Means, K represents number of clusters

Advantages
  • Scales well
  • Efficient
Disadvantages
  • Choosing K
When to Use? 
  • Normally distributed data
  • Large number of samples
  • Limited number of clusters

Use Cases 
  • Document classification
  • Customer segmentation

Python Code

from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
...               [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)
kmeans.predict([[0, 0], [12, 3]])
array([1, 0], dtype=int32)
kmeans.cluster_centers_
array([[10.,  2.],
       [ 1.,  2.]])


No comments:

Post a Comment